The early days (8/10) - Rising Force

May 20th, 2012

The Rising Force episode is one of my fondest coding-related memories. It happened at a coding-party, the Crystal Summer Convention II (CSC2). The Rising Force name obviously comes from Yngwie Malmsteen’s band. Let’s just say that at the time, we were fans. And it seemed to fit us like a glove. (For the record, I learnt for the first time about Malmsteen in a computer magazine, Micro News.)

Weary of war against competing groups, we had released Choice Of Gods without further ado. Without special announcement, without advertising, not even in the context of a coding-party – we simply sent it to a few swappers and called it a day. We were isolated from the small world of the demo scene, not aware of its gossip, rumors and stories. For the most part we stayed in our little corner. The Rising Force case was different. For the first time we went to a coding-party with a new creation, with the explicit goal of winning the demo-competition that took place there. We were going to witness people’s reactions live, on the spot, in realtime…

When we got there, we already had the screens. There wasn’t much left to code, we basically just had to put everything together, stitch one screen to the next, transfer that to a disk, done. And we naively thought we would wrap it in a few hours. That was a fatal mistake! Maybe a bit illogically it is, I think, notoriously difficult to code efficiently during a coding-party… Except we did not know that yet. It is impossible to code in a coding-party because too many extraordinary things happen all around. Too many people to meet, too much to say, too much to see. How can you focus on a task when speakers and decibels explode around you? When dozens of people constantly interrupt you for a chat? How do you track and kill a particularly nasty bug, which requires discipline and calm, in the midst of chaos? You guessed it, what had to happen did in fact happen: throughout this party I faced a bug… wait, no, not just “a” bug, it was the mother of all bugs. That was Murphy’s Law at its best: the bug that never happens, except when it can be the most damaging. Before we arrived, the organizers had already collected all competing demos, and thus they more or less already knew who would ultimately reach the podium. Oxygen, an already famous ST group, had joined the party with a finished demo, worthy of them. And then there was Nucleus/HMD as a challenger, with a demo called Phototro. Fighting like hell to finish it on time, his goal was to beat Oxygene at their own game for once.

And then, there was us. We arrived there after everyone else, but we did arrive anyway. And the small and little-known group named Holocaust was about to become a huge, a giant grain of sand in the well-oiled Oxygene machinery.

To be fair we only had a few scraps of demo, small pieces that we tried to pick up and put together. Design was lacking. It did not look polished or anything. Nonetheless, it was more than enough. Because of our isolation, because we had stayed away from “the scene”, we had never fully realized how good or how bad we were, compared to the competition. And thus what happened when one of the CSC2-people checked out our stuff was a genuine surprise to us.

Unforgettable: here we are, Elric and myself, showing a few screens to one of the jury members. The man in front of us, initially imperturbable, does not pay much attention. And then you see a spark of interest. And then something like a shock. His smile melts away. I think he realizes that the two scruffy mutants in front of him may be more than just Sunday coders after all. One of our specialties appears on the screen: a twisted mix of precomputed delta-compressed vectors, spectacular FlexiScroller, real-time 3D, all of this running at 60 Hz with, icing on the cake for us, a tiny touch of design to seal the deal. The man in front of us is stunned. He stammers out a few words, and without further ado he walks away, starts to rush behind the scenes, disappears. I suppose he went to tell the other guys that there might be a slight change of plans…

Elric and I are stunned as well. Receiving messages on the Minitel praising Choice of Gods is one thing. But this right here is something else entirely. It is much more striking, much more convincing: it is real, it is live. At that moment I started to realize, gradually, that we had gone beyond what I had imagined. It felt like not only we had reached the same level as our models and heroes, but in fact we had gone even further. And maybe we were in turn playing the role of pioneers, adventurers discovering new horizons and new possibilities.

While Elric follows the man behind the scenes to show off the rest of the demo (I am told this was epic!) I focus on the tedious bug that has kept me busy for several hours now. I try everything I can. I check everything. I recompile 100 times. I rant. I swear and I sweat. The organizers, friendly enough, push back the deadline for us, hour after hour. In vain. Meanwhile, I am in hell. As said before, I do not like doing things halfway. I am passionate. I put everything I can, all my heart and all my soul in what I do. And bloody hell that day, I thought I’d go mad. Mad with rage against this stupid machine that crashed constantly. I wanted to scream, to hit something, to throw the keyboard against a wall. Nothing worked, nothing. It was incomprehensible. A minor innocent change could make everything crash for no apparent reason. Or rather, there was a “reason” in a way: Murphy. Since we had an opportunity to beat Oxygene, HMD and all the others, there had to be something blocking our way, right? That is the Law. So time passed. And passed. And then so did the deadline. Disgusted, enraged, I had tears in my eyes: that kind of bug had never happened before! Never like this! It only happened, of course, when the stakes were high.

The competition took place in a small room, one floor down. People were starting to move, going there first to grab good seats. My eyes had not left the keyboard a single moment since the day before. And they still could not. I could not give up. Never! Not as long as I breath! I persisted. Minutes went by. The last remaining people left the room. The contest was about to start. Bitter, without much faith in it, I tried a final change… Will anybody ever believe it? It was there, and it was obvious. It was right there in the middle of the unpacking routine, which we did not write ourselves and which I would never have suspected. There it stood, proud and stupid, the most destructive bug I have ever faced. I compiled in haste, cold sweats going down the spine, and I ran the demo.

It worked. It was unreal. All the changes made since the previous day, all the pieces of the puzzle fell into place. The demo was running. I could not believe it. It was insane. It was the worst scenario I could have imagined, the meanest joke ever: everything worked, just a tiny bit too late. It was fucking unbelievable.

Too late? I was pumped full of adrenaline. I was over-excited. Disgusted as I have never been, vengeful, willing to try anything to tame this… this… this fucking bullshit piece of a demo! A second of eternity. And action! I never acted so fast in my life. Fingers fly over the keyboard. Floppy disks are inserted in the drive at the speed of light. I compile. Painfully slowly the demo is transferred to disk, one sector after the other. I have my finger on the drive’s eject button. I am already up, ready to sprint as soon as the drive’s LED goes off. Now! I run like hell, my mind is stewed, I rush down the stairs like a madman, raising the disk high in front of me… When I finally manage to find one of the guys in charge of the contest, and articulate some intelligible words to explain what the hell is going on, there are only three demos left to show on the big screen. But I won! I did it! It’s over! I got rid of that damn disk! I can now collapse, sleep, forget, relax…

I find Elric, staring blankly ahead. I am so tired. I explain the situation to him, and I finally take a look at the big screen. As a matter of fact the last two official demos to be projected there are the ones from Oxygene and HMD. Renewed interest. I check them out, as a connoisseur. They are rather good, but they lack something, they are no masterpieces. The Oxygene demo ends. I hold my breath.

One of the speakers starts to talk. Yes, there is a demo left to show. Yes, this is the one from Holocaust. Yes, it should have won if it had been completed in time. But it was not. And there are some minor bugs left in it, so…

True enough, it still has a few weird rendering bugs. Sorry, had no time to fix those. At least there is something to see. The demo is shown on the big screen, and I discover it at the same time as everyone else. I did not even have time to test if that disk worked from start to finish! But everything runs well. No crashes. I hear some reassuring comments around me: “Is this running on STE or STF?”. That one made me smile: dude, of course this is on STF! If you think we need an STE to do that, you’re dead wrong! Fullscreens, 3D, fractal mountains and a few effects never-seen before on ST catch a spell on the audience. Nobody expected anything like that from this rather unknown group… The demo ends… and we get a standing ovation! I will never forget that bit. After days of fighting that bug, this was a wonderful, extraordinary reward. I’m smiling, I’m on a cloud. Was I tired a minute ago? My fatigue is completely gone. This was the moment we understood that we had done it. We had reached our goal. We could beat our former heroes, Oxygene and others, at their own game. We could start believing in ourselves.

There are some days in life that you never forget. I had just experienced one of them.

The early days (9/10) - Japtro

May 20th, 2012

The snippets of code used in Rising Force had not been created expressly for the CSC2. They were taken from a much more ambitious project started earlier, codenamed Japtro. At the time there was a plethora of demos with strange, creative labels. Traditionally it all started with intros and demos. But soon enough some hybrids started to appear: trackmos, dentros, etc, I even saw a “pantro” once. And thus, there was no reason for us not to create our own label. This is how the Japanese Dentro, or “Japtro” was born. At that time, Elric and I were in our Manga phase, and sure enough we wanted to share this with our beloved audience! After the Rising Force experience, we went back to code like never before, hungry for success. We had tasted a dirty drug called fame.

The new challenge, the new battlefield was 3D. The new opponents, as in a classic DragonBall scenario, were far more powerful than the previous ones: Overlanders, Equinox. The Overlanders, the very people who had motivated us years before, in the Salon de la Micro 1990. The old masters. As for Equinox, represented by Keops, they were the gifted challengers we had to monitor very closely. Needless to say, it was not going to be easy.

But the Rising Force episode had changed everything. We felt ready to compete with anyone. The Japtro project accurately captured this fighting spirit: four disks crammed to the rafters with code and graphics, going in every direction, trampling underfoot the so-called stars of the day (too bad for Keops!) without a shred of scruples. It was rough, naked, stripped, with an almost total lack of design, delivering some raw code to those who wanted it. Design? Why? Sorry, this is not what we do. We did not care a bit about design, and we did not mind pointing out that fact, even if we had to insist rather tediously. We felt like hardcore coders. In fact we felt like the Pure Metal Coders from the Amiga. And this is thus not a coincidence if the Japtro intro, where the camera moves in a 3D labyrinth that is eventually revealed to be a Japtro logo, is in fact inspired from a PMC demo.

The birth of Japtro was an amazing experience. Elric came to my place a week before the coding-party in which we wanted to present the demo. We already had a lot of code and manga drawings coming from a PC, we just had to put everything together and wrap things up. For a day or two, nothing happened. We did nothing. The day was spent watching demos from other people, mentally preparing ourselves for what would follow. And then we reached the moment when we knew we had to start coding. That was it, now. There was no way to wait any longer. Duly noted.

Intensive code for 3 or 4 days. Special moments where emulation is at its peak, where reality fades, giving way to the creative process that captures all the energy of the author. During these days, what usually never happens did in fact happen: everything clicked. We did not sleep, we barely ate, we just coded. And everything worked the first time. No bugs, no crashes, accurate and well thought out code that practically writes itself. At one point a major source file got accidentally deleted. Usually this is a catastrophe. But this time, we just recoded the whole thing in a few hours, and more efficiently to boot. That was just unheard of. Magic moments when man and machine become one, moments of perfect symbiosis where things Just Work, smoothly and effortlessly. We were on auto-pilot, on a suicide run. We stayed in the flow the whole time, and managed to complete three disks in three days. I still do not know how. I have never experienced such an intense coding session afterwards. At the end of the week we had four disks shock full of stuff for an epic demo that blew away everything we had done before. We were exhausted, washed out, but proud. We had enough ammo in there to kill any opponent without compunction. High-speed 3D, spectacular fullscreens, a whole bag of new tricks, funny things involving Keops (funny for us at least!), a ton of fullscreen pictures, in short: an orgy of code, music and graphs for a beast of a demo. Four disks at once? Only the Phaleon demo matched that. But dozens of groups contributed to that one, while we were alone! Two coders to fill up four disks. I think nobody ever repeated that feat in the years that followed.

The demo was presented at the Saturn Party II, and it was a success. Still, I was not yet completely satisfied.

The early days (10/10) - Blood

May 20th, 2012

There was something missing. Just before releasing Japtro we had attended the Intermedia Forum, during which we had discovered the Flipo. This was the first demo from Diamond Design – a bunch of wizards that included, among others, some ex-Oxygene members. That demo was a jewel. They did things that we did not know how to do, and I found that unbearable (*). Similarly, the famous 3D demo from the Overlanders in the European Demos, or the fabulous Brain Damage from Aggression certainly felt like a thorn in my side…

(*) Note that the main force behind the Flipo, Oxbab, is now co-president from Naughty Dog and the main force behind Uncharted. Yes, that game. There is something about old ST coders!

Competition, competition… We could not stop there. We had to do something, no matter the cost. Something better. We were good, but we were not clearly better than the others. And I remembered in my dreams the giant gaping hole that had originally separated TCB from the rest of the world… This is why, in a final burst of creativity driven by jealousy and the desire to “rule” once and for all, we went back to coding. And this time it would be to the death.

Always faster, always more. We had often remade the same effects from one demo to the next, but bigger and better. Blood was no exception. I imagined tortured and twisted tricks to speed up our polygon-filling routines. I did sordid experiments with the video screen memory to save a few cycles. Elric pulled out of his magic hat some of the best routines he has ever created. And for once, following the Flipo’s lead, we paid a minimum amount of attention to the demo’s overall design – for the first time we even got graphics from a real artist, Mic/Dune. The difference was obvious!

Years later the fist disk of Blood remains one of the things I am the most proud of in my life. I go back to it constantly. I still watch it in 1998, amazed after some years of coding on a PC, by the miracles we pulled off with those simple STs. And I am proud of these routines, proud of having lived those times, proud of our contributions to the ST demo world, proud of having been a part of this adventure. It was not so long ago, but it feels like it was in another life already.

I did not exactly find the same drive, the same passion, the same insanity, the same “yes we can” attitude, the same “try to beat this” challenges in any of the things I did on PC afterwards. Coding on PC always feels a bit sloppy. Made/BOMB once said that when moving from Deluxe Paint to PhotoShop he lost “le contrôle du pixel” – control over individual pixels. That was spot on. And that is exactly what I also thought about the code: on PC, we lost “le contrôle du cycle” – control over individual cycles. I miss that. Coding on PC is goal-oriented: you program something as a mean to an end. You program this or that for a game, or for a tool, or just to feed a graphics driver. On the ST however, there was a certain form of beauty in the code itself, in managing to eliminate each and every “useless” instructions, in the certainty you could have that a given piece of code was fully, totally, utterly efficient, without a single wasted cycle. There was a neverending challenge just pushing your own enveloppe, competing with yourself, see if you could do better. The clear optimization rules on the ST made that possible: if you save one NOP anywhere, if will go faster. If you remove an instruction, it will go faster. It is like a game of Tetris, where you try to remove as many useless instructions as possible while organizing the remaining ones in ways that minimize the amount of holes. On the PC, all bets are off. Saving a NOP might go faster, or not. Removing an instruction might go faster, or not. The “optimization” you do on your machine may make things slower on another machine. The optimization rules one day might be different the next day. This makes all the micro-optimizations we loved spending time on completely pointless.

Coding on ST felt like fun. Coding on PC feels like a job.

Maybe I just stopped being a kid who wanted to take over the world, and I became a grown-up.

In any case, Blood was shown at the Place to Be Again, where it only reached the 2nd place. It did feel like a defeat. Having put everything we had in that demo, it felt very disappointing not to reach the 1st place. It was supposed to be a fight to the death, and we had lost. Soon after we stopped programming on the ST. Some amazing things like Tim Clarke’s Mars had started to appear on the PC, and we slowly moved to that machine. Coding on PC was not much fun. The first days were very frustrating: no INCBIN, no IDE integrated with the assembler (WTF!), memory was apparently limited to 64K in real mode (WTF!!), 80×86 assembly was retarded compared to 68000, we had to use different programs to compile and link (WTF!!!), it felt prehistoric compared to Devpac. But the PC’s brute force power was too alluring to ignore, so we bit the bullet and started learning. We later joined BOMB and a few other groups, but this is another, far less interesting story.

What do I conclude from all this?

I feel lucky to have started programming on those old machines. It was so much fun.

I feel sad for the poor kids starting programming in Java.

Idols, heroes, models, mentors, are only regular humans. You can beat any of them with hard work and a strong motivation. “Yes we can”.

You do not even need the fastest code in the world to do so. We got great results in Choice of Gods using very limited 3D rotation routines we got from STMag, and even bits and pieces of the disassembled Line-A. Jeez we really had no shame. “Whatever works”.

Rules are meant to be bent. Limits and world records are meant to be broken.

Don’t be a “prophet programmer”. Be a Nick.

People are confused

May 15th, 2012

It’s amazing how people are clueless about PhysX. It’s not the first time I read this kind of stuff on forums, but since this one is recent I’ll make a note here:

http://www.gamespot.com/forums/topic/29130252/what-do-you-expect-next-gen-for-dx1211.1-opengl-physx-havok-and-other-apis

“Havok has the advantage on PC for physics since Nvdia GPU accelerated physx is limited to their cards”

This is bullshit, plain and simple. Both Havok and PhysX work perfectly fine in software. PhysX works on anything from PC to PS3/Xbox/Mobile/etc. Basically they are similar libraries doing similar things.

But on top of that, PhysX accelerates some features on Nvidia GPU, while Havok does not. And PhysX is fully free, while you need to pay for Havok in commercial games. PhysX has the advantage here.

There are also differences in terms of performance, memory usage, robustness, features, API user-friendliness, etc. No clear winner in any of those.

Batching incoherent queries

April 7th, 2012

For a binary tree, a typical recursive collision detection function usually looks like this (in this example for a raycast):

void MyTree::Raycast(const Ray& ray, const Node* node)
{
// Perform ray-AABB overlap test
if(!RayAABBOverlap(ray, node->GetAABB()))
return;

// Ray touches the AABB. If the node is a leaf, test against triangle
if(node->IsLeaf())
{
if(RayTriangleOverlap(ray, node->GetTriangle())
RegisterHit();
}
else
{
// Internal node => recurse to left & right children
Raycast(ray, node->GetLeftChild());
Raycast(ray, node->GetRightChild());
}

}

This traditional tree traversal code has a few problems:

  • A vanilla version suffers from many cache misses since we keep jumping from one node to its children, and there is no guarantee that those child nodes will be close to us.
  • Prefetching might help. However a typical prefetch instruction has a relatively high latency, i.e. you need to wait a certain amount of cycles after the prefetch instruction has been issued, before the requested address is effectively “free” to read. The problem with a binary tree such as the above is that there is just not enough work to do in one node (basically just one ray-AABB test).
  • The problem is the same on platforms where you need to DMA the next nodes. Even if you start your DMA right when entering the function, it may not be finished by the time you reach the end of it.

There are different ways to combat this issue. You can try a more cache-friendly tree layout, say van Emde Boas. You can try a more cache-friendly tree, say N-ary instead of binary. But those alternatives often come with their own set of problems, and most lack the simplicity of a binary tree.

So instead, I would like to do my Mike Acton and ask: when is the last time you had to raycast one ray against that tree? This is not the typical case. The typical case, the generic case even, is when you end up with many raycasts against the same tree.

This kind of batched queries has been done before for raycasts. I think it is called “packet query”. But as far as I know, it has always been described for “coherent rays” (and only 4 at a time IIRC), i.e. the kind of rays you get in raytracing when firing N rays from the same pixel (say for supersampling), or from close pixels. In other words the rays are very similar to each other, going in the same direction, and thus they are likely to touch the same nodes during the recursive tree traversal.

But where is this restriction coming from? Who said the rays had to be coherent? They do not have to. It is equally simple to collide N incoherent rays against a mesh.

Here is how.

We need a mesh, i.e. a binary tree, and then a linear array of rays. I will give the code first and then comment it. The code looks roughly like this:

void MyTree::MultiRaycast(int nbRays, const Ray* rays, const Node* node)
{
// Collide all rays against current AABB
int Offset;
{
const AABB& Box = node->GetAABB();

Offset = 0;
int Last = nbRays;

while(Offset!=Last)
{
if(RayAABBOverlap(rays[Offset], Box))
{
Offset++;
}
else
{
// Do a ‘move-to-front’ transform on rays that touch the box
Last–;
Swap(rays[Offset], rays[Last]);
}
}
if(!Offset)
return;
}

// Here, Offset = number of rays that touched the box. The array has been reorganized so
// that those rays that passed the test are now at the beginning of the array. The rays
// that did not touch the box are all at the end.

// If the node is a leaf, test surviving rays against triangle
if(node->IsLeaf())
{
const Triangle T = node->GetTriangle();
for(int i=0;i<Offset;i++)
{
if(RayTriangleOverlap(rays[i], T)
RegisterHit();
}
}
else
{
// Internal node => recurse to left & right children
MultiRaycast(Offset, rays, node->GetLeftChild());
MultiRaycast(Offset, rays, node->GetRightChild());
}

}

So first we test all incoming rays against the node’s AABB. When doing so, we reorganize the rays so that the ones that passed the test are now at the beginning of the array. The rays that did not touch the box are all at the end. This is easily done with a simple ‘move-to-front’ operation. Astute readers will probably notice that this is similar to how the tree itself was built in Opcode, when the ‘positive’ and ‘negative’ primitives got reorganized at build time, in a similar fashion.

After the AABB test, if the node is a leaf, we simply test all surviving rays against the node’s triangle. Otherwise we just recurse as usual, with a simple twist: we pass down the number of surviving rays, not the original number.

And that’s it, done. Simple enough, right? The only vaguely subtle conceptual difficulty is how the recursive descent does not invalidate the re-ordering we did in the parent node. This is simply because regardless of how the child nodes reorganize the array, they are only going to reorganize the number we give them, i.e. the surviving rays. And no matter how those guys get shuffled down the tree, when the code comes back from the recursion (say after visiting the left child), we still pass a valid array of surviving rays to the right child. They are not in the same order anymore, but the permutation has only been done on the first half of the array, so we are still good to go.

So what do we get out of this? Well, a lot:

  • We are now working on N rays at a time, not just one. So that is a lot of work to do in each node, and there is plenty of time for your prefetch or your DMAs to finish.
  • The N rays we operate on are stored in a contiguous array, so accessing them is still very cache friendly. It would not be the case if we would have put all our rays in a hierarchy for example, and collided one ray-tree against the mesh.
  • We have N rays at a time, to collide against the same box or the same triangle. This is a perfect SIMD-friendly situation.
  • We traverse the tree only once now, so even if your tree layout is not cache-friendly at all, the number of traversal-related cache misses is minimized overall. In fact you can imagine it has just been divided by the number of batched rays…
  • We traverse each node only once now, so any extra work that you sometimes must do to access the AABB, like dequantization, is now only performed once instead of N times. Think about it: if you raycast N rays against the same mesh, you are dequantizing the top level AABB N times for example. With this new approach however, you do that only once.
  • In a similar way, we fetch a candidate triangle only once and raycast N rays against it. So the penalties associated with getting the triangle data (and there are quite a few, usually) are also reduced when multiple rays end up touching the same triangle.

I illustrated this idea with raycasts, but of course it works equally well with overlap tests or sweeps.

I do not know if it is a new idea or not, but I haven’t seen it described anywhere before.

Fracture demo from GDC

March 16th, 2012

http://www.youtube.com/watch?v=QxJacxI4_oE&feature=share

New broad-phase demo

December 22nd, 2011

We recently had a friendly-yet-furious internal competition at work, where the goal was to create the fastest broad-phase algorithm ever.

We slowly reached the limits of what we can do with a 3-axes sweep-and-prune (SAP), and so we were trying to finally move away from it. This is not very easy, as SAP is very efficient in typical game scenes. Unfortunately as the total number of objects in games increase, and game worlds become more and more dynamic, the SAP performance increasingly becomes an issue.

Several people attacked the problem with several (and very different) approaches. I submitted two entries: one was the multi-SAP (MSAP) broad-phase I previously mentioned; the other one is something I probably cannot talk about yet, so let’s just call it “broad-phase X” for now. Long story short: this last algorithm won.

Out of curiosity, and since I had done it before with MSAP already, I tried it out in Bullet’s “CDTestFramework” to compare it against Bullet’s algorithms. On my machine the new broad-phase is a little bit more than 4X faster than any of Bullet’s versions, and more than 2X faster than MSAP. Of course the relative performances of all those algorithms vary with the total number of objects in the scene, how many of them are static, how many of them are moving, how fast they are moving, etc. So there isn’t usually a clear winner for all possible scenarios. Still, at work we tested those new algorithms in dozens and dozens of scenes with different characteristics, and this “broad-phase X” always performed remarkably well.

The demo is straight out of Bullet 2.78, compiled in Release mode with full optimizations and the SSE2 flag enabled.

Click here to download

The new broad-phase should be available in PhysX 3.3. Stay tuned! :)

Rayman Origins

December 14th, 2011

I’m listed in the Rayman Origins credits, in the “Special Thanks” section :)

I just contributed some code. Yay!

http://www.youtube.com/watch?v=jG2OU5fJyPM (around 9′20)

Iceland rules :)

December 1st, 2011

http://richarddawkins.net/articles/644070-let-s-talk-about-evolution

Knew it.

Hello world!

October 26th, 2011

Raphaël was born on october 23, 2011. He says hello!

 

shopfr.org cialis