RAPID Hack

Pierre Terdiman

Previous revision : May, 2000

Last revision : August, 2002

I) Introduction

This little document presents several modifications of the RAPID library I implemented in my own collision detection system (Z-Collide).

All those modifications were made in order to reduce the memory requirements of RAPID 2.01 [1]. Although very rapid indeed, the RAPID library unfortunately consumes a very large amount of memory (about 412 bytes per triangle, as reported in the original OBB Tree paper). Further research papers have actually used that major flaw to emphasize how well their new system behaved compared to RAPID. (see QuickCD [2] for example) As a matter of facts, the original RAPID library is very easy to optimize – provided we’re speaking about the number of bytes required.

II) Preliminaries

Here’s the box class found in obb.H :

class box

{

public:

// placement in parent's space

// box to parent space: x_m = pR*x_b + pT

// parent to box space: x_b = pR.T()*(x_m - pT)

double pR[3][3];

double pT[3];

// dimensions

double d[3]; // this is "radius", that is,

// half the measure of a side length

box *P; // points to but does not "own".

box *N;

tri *trp;

int leaf() { return (!P && !N); }

double size() { return d[0]; }

int split_recurse(int *t, int n);

int split_recurse(int *t); // specialized for leaf nodes

};

We have :

9 doubles for the rotation part, 72 bytes

3 doubles for the translation part, 24 bytes

3 doubles for the box extents, 24 bytes

1 box pointer (positive child), 4 bytes

1 box pointer (negative child), 4 bytes

1 triangle pointer, 4 bytes

Total : 132 bytes for each box. Total number of box is 2*number of triangles – 1, so we need approximately 132 * 2 = 264 bytes per triangle.

This doesn’t take into account the actual triangles. In the original paper those triangles are made of 3 vertices, or 9 doubles, ie 72 bytes. I assume some standard indexed vertices may be used instead, and a triangle can just be made of 3 word indices (6 bytes) or dword indices (12 bytes). This has no importance : since the geometry is indexed, it always exists somewhere as actual rendering data, and that’s why I don’t take it into account for the OBBTree memory requirements.

III) Trivial modifications

1) Using single-precision floating-point values

This is of course the most trivial modification. Just replace all the doubles with floats, it works just as well. In theory, this may conduct to a loss of accuracy while recursively performing the collision queries. In practice, I’ve never noticed any differences between the double and the single precision version.

Anyway let’s quote the original paper:

« Single precision arithmetic can also be used to save memory. »

Indeed. And our box now needs 132 – 36 – 12 – 12 = 72 bytes.

2) Wiping the negative box pointer out

Ok, I like RAPID very much. Honestly, I love that library, and the algorithms behind. But when it comes to the implementation part, well, I’m puzzled. What were they thinking ?

The negative box pointer actually is useless.

Here’s an excerpt from RAPID in build.cpp :

P = RAPID_boxes + RAPID_boxes_inited++;

N = RAPID_boxes + RAPID_boxes_inited++;

So what ?

The negative box will always be found just after the positive one, and the negative box pointer can be made implicit. In other words, don’t store it ! And when you need it in collide.cpp, just do :

N = P + 1 ;

Trust me, it works.

And the leaf test becomes :

int leaf() { return (!P); }

…but don’t use that one, keep reading…

68 bytes.

3) Wiping the triangle pointer out

The triangle pointer is useless as well. It is always null, except for leaf nodes. And leaf nodes, of course, don’t have any positive or negative children since they’re leaf nodes. Hence, P and N are null for leaf nodes.

Obvious conclusion : we can just pack the triangle pointer in the P pointer (N doesn’t exist anymore since 2)).

In order to do so, replace in build.cpp :

trp = ptr;

with

P = (box*)(unsigned int(ptr) | 1);

Why setting the least significant bit ? Because we need a way to detect leaf nodes anyway. The new leaf test is :

int leaf() { return unsigned int(P) & 1; }

Of course, I assume all box and triangle pointers have their original least significant bit set to zero. This is a very cheap assumption, since it just suppose data are 2-bytes aligned. This should always happen, and in case it doesn’t, forcing the alignment should be done anyway, for pure efficiency reasons.

64 bytes.

IV) Non-trivial modifications

1) Using quaternions

The rotation is stored with a 3*3 matrix. Let’s quote the original paper :

« Using quaternions (…) results in substancial space savings, but need 13 more operations per OBB overlap test. »

This is the usual trade-off between memory and CPU time. Since we now use single-precision arithmetic, we actually already have won some CPU time, and we can afford some more operations per overlap test.

Net saving : 4 floats instead of 9, we’re down to 44 bytes.

2) Wiping W out

Orientations are encoded by unit quaternions, so we can safely discard one of the quaternion component (say W), because it’s implicit :

x^2 + y^2 + z^2 + w^2 = 1

ó w^2 = 1 – x^2 - y^2 - z^2

=> w = + / - sqrt(1 – x^2 - y^2 - z^2)

We need to store one extra bit to make the difference between W and –W. Fortunately there are a lot of places where we can store that extra bit :

- in the second least significant bit of the P / triangle pointer, which is just assuming data are aligned on 4-bytes boundaries

- in the sign bit of any of the 3 box extents – which are of course always positive, so 3 bits are wasted there.

That modification also introduces an extra square-root, but nowadays it’s becoming really cheap. If needed we may use square root approximations, or even some assembly code.

Another approach would have been to use Euler angles only.

Anyway, 40 bytes, which is 3.3 times better than the original code.

Updated august, 1, 2002 :

Stefano Bacelle sent a cool trick to get rid of the extra bit above : unit quaternions Q and -Q actually map the same rotation matrix. So, we don't need to save the extra bit, we can just force W to be positive by negating the whole quat if it is not. Works like the proverbial charm. This definitely makes sense since Euler angles only need 3 floats to encode an orientation. So that extra bit was redundant, one way or another. I just never realized it was so simple to discard it. Thanks, Stefano !

3) Realtime compression

The OBB tree takes a lot of memory, but only a little subset is actually needed to resolve collision queries at runtime. It should be possible to keep the OBB tree compressed, and only decompress needed branches on the fly – which of course saves a lot of memory, but needs some more CPU time.

Used in a standard rigid-body simulator, this method should be efficient, because we often need to make multiple collision queries in search of the exact collision time. If we use temporal coherence, we can decompress a given branch the first time it’s needed, then keep it in a local cache when searching for the exact collision time.

Anyway, this is something I haven’t actually tried – yet.

Updated august, 1, 2002 :

As promised I tried the approach in my own library, Opcode.