Triangle configuration table
Here's a very old tip that might still be useful nowadays.
I was browsing the source code for Nick's very nice "Soft-Wired
Shaders" COTD, when I noticed this in the rasterizer's triangle setup
:
//
Sort: V1 top, V2 middle, V3 bottom
if(V1->y
> V3->y) swap(V1, V3);
if(V2->y
> V3->y) swap(V2, V3);
if(V1->y
> V2->y) swap(V1, V2);
This is a very common piece of code in rasterizers, yet some years ago
it was considered "sub-optimal" for at least two reasons :
-
FPU comparisons producing bad-looking assembly code
-
branch misprediction penalties, since there's typically no coherence at
this point of the pipeline, from one triangle to the next (triangle’s
orientation is mostly random)
This is obviously not the bottleneck in Nick's code, and I'm not even
sure old optimization rules still apply on today's machines, but the following
tip might still be relevant.
You can replace these tests with a "configuration table", that
gets rid of the two previously mentioned problems :
#define LEFT 0 // long
edge on the left side of triangle
#define RIGHT 1 // long edge
on the right side of triangle
#define DUMMY
255 // for padding
#pragma pack(1)
static ubyte gConfigTable[] =
{
0, 255, 255, 255, 255, 255, DUMMY, DUMMY,
RIGHT, 2, 1, 0, 1, 0, DUMMY, DUMMY,
RIGHT, 1, 0, 2, 0, 2, DUMMY, DUMMY,
LEFT, 1, 2, 0, 0, 2, DUMMY, DUMMY,
RIGHT, 0, 2, 1, 2, 1, DUMMY, DUMMY,
LEFT, 2, 0, 1, 1, 0, DUMMY, DUMMY,
LEFT, 0, 1, 2, 2, 1, DUMMY, DUMMY,
0, 255, 255, 255, 255, 255, DUMMY, DUMMY
};
#pragma pack()
To find triangle's configuration you simply index the table using a
bitmask, computed from triangle’s Y coords :
#define SIGN_BITMASK 0x80000000
// Integer representation of a floating-point
value.
#define IR(x) ((udword&)(x))
udword dy0 = (IR(V1->y) - IR(V2->y))
& SIGN_BITMASK;
udword dy1 = (IR(V2->y) - IR(V3->y))
& SIGN_BITMASK;
udword dy2 = (IR(V3->y) - IR(V1->y))
& SIGN_BITMASK;
udword Index = (dy0 >> 29)|(dy1 >>
30)|(dy2 >> 31) << 3;
ubyte Type =
gConfigTable[Index+0];
ubyte Top =
gConfigTable[Index+1];
ubyte Middle =
gConfigTable[Index+2];
ubyte Bottom =
gConfigTable[Index+3];
ubyte Right =
gConfigTable[Index+4];
ubyte Left =
gConfigTable[Index+5];
Note that it requires properly clipped TL-vertices, whose Y is positive,
in order to work.
Then you pretty much know everything :
Type says if this is a LEFT or RIGHT triangle, i.e. if the long edge is on
the right or on the left side of it.
Top, Middle and Bottom are sorted vertex indices, similar to
what Nick’s code is computing in the original code.
As a free bonus you also get the indices of Right and Left
vertices, that can be used to compute gradients without a single “if” or any
special case. Resulting code is very short and clean.
For example, here’s how you can render a flat triangle :
void FlatRenderer::DrawTriangle(const Triangle*
t)
{
SetFPUCeilMode(); // We let the casts do the
ceiling for us
udword dy0 =
(IR(t->mVerts[0].y) - IR(t->mVerts[1].y))&SIGN_BITMASK;
udword dy1 =
(IR(t->mVerts[1].y) - IR(t->mVerts[2].y))&SIGN_BITMASK;
udword dy2 =
(IR(t->mVerts[2].y) - IR(t->mVerts[0].y))&SIGN_BITMASK;
udword
Index = ((dy0 >> 29)|(dy1 >> 30)|(dy2 >> 31)) << 3;
ubyte
Type = gConfigTable[Index+0];
ubyte
Top = gConfigTable[Index+1];
ubyte
Middle = gConfigTable[Index+2];
ubyte
Bottom = gConfigTable[Index+3];
ubyte
Right = gConfigTable[Index+4];
ubyte
Left = gConfigTable[Index+5];
//
Compute #scanlines to go to the middle vertex
udword
CeilYBottom =
udword(t->mVerts[Bottom].y);
udword
CeilYMiddle =
udword(t->mVerts[Middle].y);
udword
CeilYTop =
udword(t->mVerts[Top].y);
udword
Count =
CeilYMiddle - CeilYTop;
//
Prestep for subpixel accuracy
float
PreStep = t->mVerts[Top].y - float(CeilYTop);
//
Compute slopes and usual stuff
float
dX[2];
dX[RIGHT]
= (t->mVerts[Right].x - t->mVerts[Top].x) / (t->mVerts[Right].y -
t->mVerts[Top].y);
dX[LEFT]
= (t->mVerts[Left].x - t->mVerts[Top].x) / (t->mVerts[Left].y -
t->mVerts[Top].y);
//
Prestep slopes
float
CurrentX[2];
CurrentX[RIGHT]
= t->mVerts[Top].x - PreStep * dX[RIGHT];
CurrentX[LEFT]
= t->mVerts[Top].x - PreStep * dX[LEFT];
//
Draw triangle
ubyte*
Dest = &mBuffer[CeilYTop*mPitch];
udword
Nb=2;
while(Nb--)
{
while(Count)
{
sdword StartPix =
sdword(CurrentX[LEFT]);
sdword NbPixels =
(sdword)CurrentX[RIGHT] - StartPix;
udword* Buf =
((udword*)Dest)+StartPix;
while(NbPixels-->0)
{
*Buf++
= mColor;
}
CurrentX[LEFT]+=dX[LEFT];
CurrentX[RIGHT]+=dX[RIGHT];
Dest+=mPitch;
Count--;
}
Count = CeilYBottom -
CeilYMiddle;
if(Count)
{
PreStep = t->mVerts[Middle].y
- float(CeilYMiddle);
dX[Type] = (t->mVerts[Bottom].x -
t->mVerts[Middle].x) / (t->mVerts[Bottom].y - t->mVerts[Middle].y);
CurrentX[Type]
= t->mVerts[Middle].x - PreStep * dX[Type];
}
}
SetFPUFloorMode();
}
Notes :
-
the above code is only included for information purpose, it may not be
very safe. For example I don’t think it works correctly if you don’t
compile using /QIfist.
-
of course you don't have to change the FPU mode for each triangle... The
above code was not written with speed in mind !
This tip might be obsolete nowadays, but I've never seen it explained
anywhere (though I’m sure it’s old news to many of you). Somehow I felt it
would be a good idea to share it nonetheless. I mainly used this in
Phototracer's scanline (Phototracer is an old Photoshop plug-in), and before in
various PC demos.
Pierre