|
|
|
Simple C++ optimizations - Part 1
an example of low-level optimization |
|
The following snippet is quite common: float f,g,x;
If the value of x is highly unpredictable, the resulting branch could be quite expensive.... if (x>=0.0) f += g; Let's write a function that returns 1.0 if x is positive and 0.0 if it is negative: inline float ispositive(float x) {
Now we'll write instedreturn (x>=0.0?1.0:0.0);
}f += g * ispositive(x);
Of course we just shuffled the problem around (actually worsened, efficiency-wise) but we can
remove the branch using the common memory-manipulation approach.inline float ispositive(float x) {
union {
float f;
} v;int i; v.f = x; //return (x>=0.0?1.0:0.0);
}
Now v.i will hold the 32 bits of the float x. If you are unfamiliar with the IEEE floating point representation, this page
is an excellent source of information. For our purpose we just need to
know that the sign is stored in most significant bit (pun, albeit
non-phonetical, intended), and the floating point representations of the two possible output values: 0.0f is
0...00 (all zeroes), while 1.0f is 001111110...00, or 0x3f800000 in hexadecimal. We must somehow transform our sign bit into the correct output.So what are we going to do with our 32 bits in v.i? Let's start by discarding what is useless: v.i &= 0x80000000;
v.i = ~v.i;
v.i &= 0x80000000; v.i = v.i >> 31;
At this point, if x is positive (sign bit 0) we have 1...11 and 0...00 otherwise. A most convenient pattern, since v.i &= 0x3f800000;
This is the function so far - the right shift can be limited to 8 since the last 23 bits are going to be zeroed anyway: inline float ispositive(float x) {
union {
float f;
} v;int i; v.f = x; //v.i = ~v.i; //v.i = v.i >> 8; //v.i &= 0x3f800000; v.i = (int(~v.i) >> 8) & 0x3f800000; //compact form return v.f; }
Or, getting rid of the union (I suspect this could let the compiler do some marginal optimizations with complex expressions): inline float ispositive(float x) {
int i = (int(~*(reinterpret_cast<int*>(&x))) >> 8) & 0x3f800000;
}return *(reinterpret_cast<float*>(&i));
The saga continues in part 2 with an example of high-level, and definitely wiser, optimization. |
|
|