Triangle configuration table

Here's a very old tip that might still be useful nowadays.

I was browsing the source code for Nick's very nice "Soft-Wired Shaders" COTD, when I noticed this in the rasterizer's triangle setup :

// Sort: V1 top, V2 middle, V3 bottom

if(V1->y > V3->y) swap(V1, V3);

if(V2->y > V3->y) swap(V2, V3);

if(V1->y > V2->y) swap(V1, V2);

This is a very common piece of code in rasterizers, yet some years ago it was considered "sub-optimal" for at least two reasons :

- FPU comparisons producing bad-looking assembly code

- branch misprediction penalties, since there's typically no coherence at this point of the pipeline, from one triangle to the next (triangle’s orientation is mostly random)

This is obviously not the bottleneck in Nick's code, and I'm not even sure old optimization rules still apply on today's machines, but the following tip might still be relevant.

You can replace these tests with a "configuration table", that gets rid of the two previously mentioned problems :

#define LEFT 0 // long edge on the left side of triangle

#define RIGHT 1 // long edge on the right side of triangle

#define DUMMY 255 // for padding

#pragma pack(1)

static ubyte gConfigTable[] =

{

0, 255, 255, 255, 255, 255, DUMMY, DUMMY,

RIGHT, 2, 1, 0, 1, 0, DUMMY, DUMMY,

RIGHT, 1, 0, 2, 0, 2, DUMMY, DUMMY,

LEFT, 1, 2, 0, 0, 2, DUMMY, DUMMY,

RIGHT, 0, 2, 1, 2, 1, DUMMY, DUMMY,

LEFT, 2, 0, 1, 1, 0, DUMMY, DUMMY,

LEFT, 0, 1, 2, 2, 1, DUMMY, DUMMY,

0, 255, 255, 255, 255, 255, DUMMY, DUMMY

};

#pragma pack()

To find triangle's configuration you simply index the table using a bitmask, computed from triangle’s Y coords :

#define SIGN_BITMASK 0x80000000

// Integer representation of a floating-point value.

#define IR(x) ((udword&)(x))

udword dy0 = (IR(V1->y) - IR(V2->y)) & SIGN_BITMASK;

udword dy1 = (IR(V2->y) - IR(V3->y)) & SIGN_BITMASK;

udword dy2 = (IR(V3->y) - IR(V1->y)) & SIGN_BITMASK;

udword Index = (dy0 >> 29)|(dy1 >> 30)|(dy2 >> 31) << 3;

ubyte Type = gConfigTable[Index+0];

ubyte Top = gConfigTable[Index+1];

ubyte Middle = gConfigTable[Index+2];

ubyte Bottom = gConfigTable[Index+3];

ubyte Right = gConfigTable[Index+4];

ubyte Left = gConfigTable[Index+5];

Note that it requires properly clipped TL-vertices, whose Y is positive, in order to work.

Then you pretty much know everything :

Type says if this is a LEFT or RIGHT triangle, i.e. if the long edge is on the right or on the left side of it.

Top, Middle and Bottom are sorted vertex indices, similar to what Nick’s code is computing in the original code.

As a free bonus you also get the indices of Right and Left vertices, that can be used to compute gradients without a single “if” or any special case. Resulting code is very short and clean.

For example, here’s how you can render a flat triangle :

void FlatRenderer::DrawTriangle(const Triangle* t)

{

SetFPUCeilMode(); // We let the casts do the ceiling for us

udword dy0 = (IR(t->mVerts[0].y) - IR(t->mVerts[1].y))&SIGN_BITMASK;

udword dy1 = (IR(t->mVerts[1].y) - IR(t->mVerts[2].y))&SIGN_BITMASK;

udword dy2 = (IR(t->mVerts[2].y) - IR(t->mVerts[0].y))&SIGN_BITMASK;

udword Index = ((dy0 >> 29)|(dy1 >> 30)|(dy2 >> 31)) << 3;

ubyte Type = gConfigTable[Index+0];

ubyte Top = gConfigTable[Index+1];

ubyte Middle = gConfigTable[Index+2];

ubyte Bottom = gConfigTable[Index+3];

ubyte Right = gConfigTable[Index+4];

ubyte Left = gConfigTable[Index+5];

// Compute #scanlines to go to the middle vertex

udword CeilYBottom = udword(t->mVerts[Bottom].y);

udword CeilYMiddle = udword(t->mVerts[Middle].y);

udword CeilYTop = udword(t->mVerts[Top].y);

udword Count = CeilYMiddle - CeilYTop;

// Prestep for subpixel accuracy

float PreStep = t->mVerts[Top].y - float(CeilYTop);

// Compute slopes and usual stuff

float dX[2];

dX[RIGHT] = (t->mVerts[Right].x - t->mVerts[Top].x) / (t->mVerts[Right].y - t->mVerts[Top].y);

dX[LEFT] = (t->mVerts[Left].x - t->mVerts[Top].x) / (t->mVerts[Left].y - t->mVerts[Top].y);

// Prestep slopes

float CurrentX[2];

CurrentX[RIGHT] = t->mVerts[Top].x - PreStep * dX[RIGHT];

CurrentX[LEFT] = t->mVerts[Top].x - PreStep * dX[LEFT];

// Draw triangle

ubyte* Dest = &mBuffer[CeilYTop*mPitch];

udword Nb=2;

while(Nb--)

{

while(Count)

{

sdword StartPix = sdword(CurrentX[LEFT]);

sdword NbPixels = (sdword)CurrentX[RIGHT] - StartPix;

udword* Buf = ((udword*)Dest)+StartPix;

while(NbPixels-->0)

{

*Buf++ = mColor;

}

CurrentX[LEFT]+=dX[LEFT];

CurrentX[RIGHT]+=dX[RIGHT];

Dest+=mPitch;

Count--;

}

Count = CeilYBottom - CeilYMiddle;

if(Count)

{

PreStep = t->mVerts[Middle].y - float(CeilYMiddle);

dX[Type] = (t->mVerts[Bottom].x - t->mVerts[Middle].x) / (t->mVerts[Bottom].y - t->mVerts[Middle].y);

CurrentX[Type] = t->mVerts[Middle].x - PreStep * dX[Type];

}

SetFPUFloorMode();

}

Notes :

- the above code is only included for information purpose, it may not be very safe. For example I don’t think it works correctly if you don’t compile using /QIfist.

- of course you don't have to change the FPU mode for each triangle... The above code was not written with speed in mind !

This tip might be obsolete nowadays, but I've never seen it explained anywhere (though I’m sure it’s old news to many of you). Somehow I felt it would be a good idea to share it nonetheless. I mainly used this in Phototracer's scanline (Phototracer is an old Photoshop plug-in), and before in various PC demos.

Pierre