PhysX tip: aggregates and MBP
Monday, January 23rd, 2017This video shows 200 kinematic characters running around. There is no “physics” per-se in this scene - zero dynamic objects, everything is kinematic and controlled by the users’ code. But the scene still takes a lot of time in PhysX, because all these shapes have to be updated in the broadphase structure.
This is a worst-case scenario for the default broadphase (SAP): all objects move all the time, and they are all located at the same altitude. As a result, the projections of the objects’ AABBs overlap a lot on the Y axis, which creates a lot of “swaps” in the structure, and this takes a lot of time to update.
Of course this is an artificial scene, but it shows problems that do happen in real-world scenarios, in particular if we add all the extra bounds from the static environment. This is not shown in the video but there are other scenes in the combo box to test this case as well. The mockup static level looks like this:
So how do we make things run faster here? There are two main ways.
The first tip is to use “aggregates”. An aggregate is a collection of actors grouped together to form a single entry in the broadphase. In PhysX you should already be familiar with compound actors that group together multiple shapes within a single actor. It is the same idea: you can group together multiple actors within a single aggregate.
A typical use-case is a ragdoll / character, as shown in this video. In this example each character has 19 body parts, i.e. 19 actors. By default all these actors have a broadphase entry each. The body parts and their AABBs overlap each-other quite a lot all the time, and this puts a lot of stress on the SAP broadphase. But if you put each character in its own aggregate, it suddenly creates 19 times less entries in the broadphase, and an overlap is only registered when two characters touch each-other - i.e. when the white compound bounds shown in the video at 0:27 overlap each-other.
If self-collisions or character-vs-character collisions are needed, additional tests are performed after the broadphase to take care of those. The code becomes more complex, since there is now a two-level hierarchy in the broadphase module, but the results are faster overall than putting everything naïvely in the broadphase. Most notably, when self-collisions within an aggregate are not needed, the filtering is done by testing a single bit (at aggregate level) instead of doing this for each overlapping pair within the aggregate. Generally speaking it is always a good idea to use aggregates, provided you don’t put thousands of actors in each of them.
—
The second tip is to consider using “MBP” (for Multi Box Pruning). This is an alternative broadphase implementation that does not suffer from the same pitfalls as SAP, and it tends to be faster when a lot of objects are moving at the same time. On the other hand it is usually slower than SAP when few objects are moving, i.e. when the majority of the scene is sleeping. This implementation is based on my old box pruning code (but rewritten and much much faster), borrows ideas from the “multi-SAP” approach I described here, and then adds an additional layer of code to take care of sleeping objects. I wrote about it before and showed a demo of it, in a post where it was called “broadphase X”. Well, now you know, and you can grab the code on GitHub.
MBP currently works with user-defined regions, i.e. it is more tedious to setup than SAP - and that’s one main reason why it is not enabled by default. PEEL simply takes the scene’s global bounds and divides them into grid cells, which is usually a good default setup. A real game could do something more advanced, but it is not always needed. As you can see in the video, simply using a few default MBP grid cells has a large impact on performance: in this scene it is pretty much the same performance gain as using aggregates.
Then one can of course use both aggregates and MBP. But it does not always help. In this particular test case for example (200 kinematic characters alone, no mockup static level), combining both does not lead to additional gains compared to simply using one. The performance with the various options look like this:
YMMW and things will depend a lot on the scene’s configuration, the percentage of objects moving at any given time compared to the whole scene, etc.
In any case this post was just to introduce two options you can consider when broadphase performance becomes an issue: aggregates and MBP. Both of them have been used and shipped in AAA games, on PC as well as consoles like the PS4. They are viable options that you could experiment with.