Rendering the CPU time

 

 

 

How do you profile your code ?

 

Nowadays, there are two usual ways :

-         RDTSC

-         VTune

 

RDTSC gives the number of elapsed cycles during the profiled sequence. For example you may use it that way :

 

      udword NbCycles ;

      StartProfile(&NbCycles) ;

     

      profiled code

     

      EndProfile(&NbCycles) ;

      // Get the number of elapsed cycles in NbCycles

 

The used functions could be :

 

      inline void StartProfile(udword* val)

      {

            __asm {

                  rdtsc

                  mov         ebx, val

                  mov         [ebx], eax

            }

      }

 

      inline void EndProfile(udword* val)

      {

            __asm {

                  rdtsc

                  mov         ebx, val

                  sub         eax, [ebx]

                  mov         [ebx], eax

            }

      }

 

VTune is the horsepower solution : it gives you cycles, cache misses, etc, basically everything. VTune is a great tool to play with, and it’s really worth trying it. Unfortunately, you must first install it, learn it, and it may just be a bit complicated for basic needs.

 

Sometimes you don’t really mind about the exact number of cache misses or the exact number of cycles, you just want to have a coarse idea of how much CPU time you’re eating in a given piece of code. Moreover, a number of cycles is not something obvious, and greatly depends on your computer’s frequency. If RDTSC claims some routine takes 50000 cycles, so what ? Is it too much, is it cheap ? Without extra information, you just can’t tell.

 

That’s why I’m usually fond of another way of profiling my code, which is a lot more visual. I don’t count cycles, I count scanlines. This is actually a well-known method, but it has became quite obsolete since the generalization of 16bits and 32bits frame buffers… Why ? Because the usual way to do that was to change the background color before and after the code to profile. Then you could directly see on screen how much scanlines it took.

 

Now, you can’t do that anymore because your framebuffer is 16 or 32bits, and there’s no more palette you can immediately modify.

 

Fortunately, we still can get the same visual profiling thanks to a DirectDraw method :

GetScanLine. As the name suggests, it returns the current scanline beeing traced by the electron beam. Hence the recipe :

 

FirstLine = GetScanline() ;

…code to profile…

LastLine = GetScanline() ;

NbScanlines = LastLine – FirstLine ;

 

Then you just have to render the elapsed CPU time, as a standard TLVERTEX quad :

 

 

bool Renderer::DrawCPUTime(udword y)

{

      TLVertex      Verts[4];         // Vertices for a rectangle

      uword       Indexes[6];       // Indices

 

// Initalize the vertices

      float      sx          = (float)mRenderWidth;

      float      sy          = (float)(mLastScanline - mFirstScanline);

      float      ystart      = float(y);

 

      Verts[0].p.x            = 0.0f;

      Verts[0].p.y            = ystart+sy;

      Verts[0].p.z            = 0.0f;

      Verts[0].rhw            = 1.0f;

      Verts[0].color          = 0x7fffffff;

      Verts[0].specular      = 0;

      Verts[0].u              = 0.0f;

      Verts[0].v              = 1.0f;

 

      Verts[1].p.x            = 0.0f;

      Verts[1].p.y            = ystart;

      Verts[1].p.z            = 0.0f;

      Verts[1].rhw            = 1.0f;

      Verts[1].color          = 0x7fffffff;

      Verts[1].specular      = 0;

      Verts[1].u              = 0.0f;

      Verts[1].v              = 0.0f;

 

      Verts[2].p.x            = 0.0f+sx;

      Verts[2].p.y            = ystart+sy;

      Verts[2].p.z            = 0.0f;

      Verts[2].rhw            = 1.0f;

      Verts[2].color          = 0x7fffffff;

      Verts[2].specular      = 0;

      Verts[2].u              = 1.0f;

      Verts[2].v              = 1.0f;

 

      Verts[3].p.x            = 0.0f+sx;

      Verts[3].p.y            = ystart;

      Verts[3].p.z            = 0.0f;

      Verts[3].rhw            = 1.0f;

      Verts[3].color          = 0x7fffffff;

      Verts[3].specular      = 0;

      Verts[3].u              = 1.0f;

      Verts[3].v              = 0.0f;

 

      // Initialize the indices

      Indexes[0]              = 0;

      Indexes[1]              = 1;

      Indexes[2]              = 2;

      Indexes[3]              = 2;

      Indexes[4]              = 1;

      Indexes[5]              = 3;

 

      mRS->SetLighting(false);

      mRS->SetAlphaBlending(false);

      mRS->SetTexture(null) ;

      mRS->SetMaterial();

      mRS->SetCullMode(CULL_NONE);

      return DrawIndexedPrimitive(PRIMTYPE_TRILIST, VF_XYZRHW|VF_DIFFUSE|VF_SPECULAR|VF_TEX1, Verts, 4, Indexes, 6);

}

 

 

Unfortunately I don’t know how to do the same with OpenGL.

 

Historical notes :

The first game I saw using that method was Goldrunner, by Steve Bak, 1987. You had to press the F10 key to discover some strange dancing rasters on your screen : the actual CPU time used.

 

 

 

 

 

Pierre Terdiman