Box pruning revisited - part 10 - integer SIMD
Part 10 – integer SIMD
Interlude:
Similar to what we did in version 7, and in sake of completeness, I tried using integer comparisons in the SIMD version as well.
The changes are straightforward: encode floats as integers like we did in version 7, replace SIMD floating-point intrinsics with SIMD integer intrinsics.
There are no special traps here, and not much to report, because at the end of the day the results are slower:
Home PC:
Complete test (brute force): found 11811 intersections in 781780 K-cycles.
17454 K-cycles.
16681 K-cycles.
16669 K-cycles.
17145 K-cycles.
16703 K-cycles.
16683 K-cycles.
16668 K-cycles.
16667 K-cycles.
16862 K-cycles.
16707 K-cycles.
16670 K-cycles.
16689 K-cycles.
16668 K-cycles.
16949 K-cycles.
16710 K-cycles.
16667 K-cycles.
Complete test (box pruning): found 11715 intersections in 16667 K-cycles.
Office PC:
Complete test (brute force): found 11811 intersections in 810906 K-cycles.
16607 K-cycles.
15927 K-cycles.
15648 K-cycles.
15971 K-cycles.
15648 K-cycles.
15960 K-cycles.
15742 K-cycles.
15990 K-cycles.
15837 K-cycles.
15741 K-cycles.
15970 K-cycles.
15651 K-cycles.
16247 K-cycles.
15649 K-cycles.
15834 K-cycles.
15738 K-cycles.
Complete test (box pruning): found 11715 intersections in 15648 K-cycles.
The gains are summarized here:
Home PC |
Timings (K-Cycles) |
Delta (K-Cycles) |
Speedup |
Overall X factor |
(Version1) |
(101662) |
|
|
|
Version2 - base |
98822 |
0 |
0% |
1.0 |
Version3 |
93138 |
~5600 |
~5% |
~1.06 |
Version4 |
81834 |
~11000 |
~12% |
~1.20 |
Version5 |
78140 |
~3600 |
~4% |
~1.26 |
Version6a |
60579 |
~17000 |
~22% |
~1.63 |
Version6b |
41605 |
~18000 |
~31% |
~2.37 |
(Version7) |
(40906) |
- |
- |
- |
(Version8) |
(31383) |
(~10000) |
(~24%) |
(~3.14) |
Version9a |
34486 |
~7100 |
~17% |
~2.86 |
Version9b - unsafe |
32477 |
~2000 |
~5% |
~3.04 |
Version9b - safe |
32565 |
~1900 |
~5% |
~3.03 |
Version9c - unsafe |
16223 |
~16000 |
~50% |
~6.09 |
Version9c - safe |
14802 |
~17000 |
~54% |
~6.67 |
Version10 |
(16667) |
- |
- |
- |
Office PC |
Timings (K-Cycles) |
Delta (K-Cycles) |
Speedup |
Overall X factor |
(Version1) |
(96203) |
|
|
|
Version2 - base |
92885 |
0 |
0% |
1.0 |
Version3 |
88352 |
~4500 |
~5% |
~1.05 |
Version4 |
77156 |
~11000 |
~12% |
~1.20 |
Version5 |
73778 |
~3300 |
~4% |
~1.25 |
Version6a |
58451 |
~15000 |
~20% |
~1.58 |
Version6b |
45634 |
~12000 |
~21% |
~2.03 |
(Version7) |
(43987) |
- |
- |
- |
(Version8) |
(29083) |
(~16000) |
(~36%) |
(~3.19) |
Version9a |
31864 |
~13000 |
~30% |
~2.91 |
Version9b - unsafe |
15097 |
~16000 |
~52% |
~6.15 |
Version9b - safe |
15116 |
~16000 |
~52% |
~6.14 |
Version9c - unsafe |
12707 |
~2300 |
~15% |
~7.30 |
Version9c - safe |
12562 |
~2500 |
~16% |
~7.39 |
Version10 |
(15648) |
- |
- |
- |
What we learnt:
Don’t bother with integer comparisons anymore.
We are still using integer SIMD in PhysX for this, so it appears that the PhysX version is sub-optimal. Expect some performance gains in PhysX 3.5.