Rendering the CPU time
How do you profile
your code ?
Nowadays, there are
two usual ways :
-
RDTSC
-
VTune
RDTSC gives the
number of elapsed cycles during the profiled sequence. For example you may use
it that way :
udword
NbCycles ;
StartProfile(&NbCycles) ;
…
profiled
code
…
EndProfile(&NbCycles) ;
//
Get the number of elapsed cycles in NbCycles
The used functions
could be :
inline
void StartProfile(udword* val)
{
__asm {
rdtsc
mov ebx, val
mov [ebx], eax
}
}
inline
void EndProfile(udword* val)
{
__asm {
rdtsc
mov ebx, val
sub eax, [ebx]
mov [ebx], eax
}
}
VTune is the
horsepower solution : it gives you cycles, cache misses, etc, basically
everything. VTune is a great tool to play with, and it’s really worth trying
it. Unfortunately, you must first install it, learn it, and it may just be a
bit complicated for basic needs.
Sometimes you don’t
really mind about the exact number of cache misses or the exact number of
cycles, you just want to have a coarse idea of how much CPU time you’re eating
in a given piece of code. Moreover, a number of cycles is not something
obvious, and greatly depends on your computer’s frequency. If RDTSC claims some
routine takes 50000 cycles, so what ? Is it too much, is it cheap ?
Without extra information, you just can’t tell.
That’s why I’m
usually fond of another way of profiling my code, which is a lot more visual. I
don’t count cycles, I count scanlines. This is actually a well-known method,
but it has became quite obsolete since the generalization of 16bits and 32bits
frame buffers… Why ? Because the usual way to do that was to change the
background color before and after the code to profile. Then you could directly
see on screen how much scanlines it took.
Now, you can’t do
that anymore because your framebuffer is 16 or 32bits, and there’s no more
palette you can immediately modify.
Fortunately, we
still can get the same visual profiling thanks to a DirectDraw method :
GetScanLine. As the
name suggests, it returns the current scanline beeing traced by the electron
beam. Hence the recipe :
FirstLine = GetScanline() ;
…
…code to profile…
…
LastLine = GetScanline() ;
NbScanlines = LastLine – FirstLine ;
Then you just have
to render the elapsed CPU time, as a standard TLVERTEX quad :
bool Renderer::DrawCPUTime(udword y)
{
TLVertex Verts[4]; // Vertices for a rectangle
uword Indexes[6]; //
Indices
// Initalize the
vertices
float sx = (float)mRenderWidth;
float sy = (float)(mLastScanline - mFirstScanline);
float ystart =
float(y);
Verts[0].p.x =
0.0f;
Verts[0].p.y =
ystart+sy;
Verts[0].p.z =
0.0f;
Verts[0].rhw =
1.0f;
Verts[0].color =
0x7fffffff;
Verts[0].specular = 0;
Verts[0].u =
0.0f;
Verts[0].v =
1.0f;
Verts[1].p.x =
0.0f;
Verts[1].p.y =
ystart;
Verts[1].p.z =
0.0f;
Verts[1].rhw =
1.0f;
Verts[1].color =
0x7fffffff;
Verts[1].specular = 0;
Verts[1].u =
0.0f;
Verts[1].v =
0.0f;
Verts[2].p.x =
0.0f+sx;
Verts[2].p.y =
ystart+sy;
Verts[2].p.z =
0.0f;
Verts[2].rhw =
1.0f;
Verts[2].color =
0x7fffffff;
Verts[2].specular = 0;
Verts[2].u =
1.0f;
Verts[2].v =
1.0f;
Verts[3].p.x =
0.0f+sx;
Verts[3].p.y =
ystart;
Verts[3].p.z =
0.0f;
Verts[3].rhw =
1.0f;
Verts[3].color =
0x7fffffff;
Verts[3].specular = 0;
Verts[3].u =
1.0f;
Verts[3].v =
0.0f;
//
Initialize the indices
Indexes[0] =
0;
Indexes[1] =
1;
Indexes[2] =
2;
Indexes[3] =
2;
Indexes[4] =
1;
Indexes[5] =
3;
mRS->SetLighting(false);
mRS->SetAlphaBlending(false);
mRS->SetTexture(null) ;
mRS->SetMaterial();
mRS->SetCullMode(CULL_NONE);
return
DrawIndexedPrimitive(PRIMTYPE_TRILIST,
VF_XYZRHW|VF_DIFFUSE|VF_SPECULAR|VF_TEX1, Verts, 4, Indexes, 6);
}
Unfortunately I
don’t know how to do the same with OpenGL.
Historical
notes :
The first game I saw
using that method was Goldrunner, by Steve Bak, 1987. You had to press
the F10 key to discover some strange dancing rasters on your screen : the
actual CPU time used.
Pierre Terdiman