Simple C++ optimizations - Part 2
an example of high-level optimization




DISCLAIMER: sexual allusions and graphic violence were added solely for a more entertaining reading experience.

Once upon a time I wanted to calculate the approximate length of the sides of a polygon that gets projected onto one pixel on the screen;
to make a long story short, and simplifying to a 2D-to-1D-projection, I had (insert the canonical image of an ABC triangle with opposite angles alpha, beta, gamma):
C: vector to camera
N: normal in camera space
B: useless second side of the triangle :)
|A|: incognita, 'gn' to be pronounced like 'ñ' of 'niño' and NOT like 'gn' of 'gnoseology', even though the latter evidently shows an etymological link, and this lengthy and inconvenient digression is meant to boldly state that I like to think that Latin pronunciation is SO TOTALLY different from Greek - or English
Using basic trigonometry - and a very good lens to inspect the imaginary picture above - one sees that
|A| = sin(alpha) * |C| / sin(gamma)
and I could calculate alpha and gamma like this:
alpha = atan(x') - atan(x)
beta = acos(dotproduct(normalize(C),N))
gamma = pi/2 - (alpha + beta)
Don't worry about the details (or, forget everything you read so far): the implementation is of no importance here, I'm using it as an example to show the opportunity for 'conceptual' optimizations.
The resulting C++ code could be something like this:
vector2 C,N;
...
float lengthC = length(C);
normalize(C);
float beta = acos(dot(C,N));
float alpha = atan(x1) - atan(x);
float gamma = M_PI_2 - (alpha + beta);
float side = sin(alpha) * lengthC / sin(gamma);
This 'naive' code is not exactly fast, as it has to deal with 5 trigonometric functions per iteration.

Now the fun begins (though most of what I've written above is laughable, already). We will try to get rid of these expensive functions, or replace them with less expensive ones, by just applying mathematical identities. We first observe that sin(pi/2-x) = cos(x), hence:
//float gamma = M_PI_2 - (alpha + beta); farewell, good ol' gamma
float newgamma = alpha + beta; //do I like you?

float side = sin(alpha) * lengthC / cos(newgamma);
Then, since cos(alpha+beta) = cos(alpha)*cos(beta) - sin(alpha)*sin(beta)
//float newgamma = alpha + beta; newgamma wasn't nearly as nice, anyway
float side = sin(alpha) * lengthC / (cos(alpha)*cos(beta) - sin(alpha)*sin(beta));
This 'optimization' looks sub-optimal indeed, but beta = acos(dot(C,N)), so cos(beta) = dot(C,N), so
//float beta = acos(dot(C,N)); another fine but inefficient float leaving the scene
float cosbeta = dot(C,N);
float side = sin(alpha) * lengthC / (cos(alpha)*cosbeta - sin(alpha)*sin(acos(cosbeta)));
Square root is faster than trigonometric functions, and sin(x)=sqrt(1-cos2(x)), so
float cosalpha = cos(alpha);
float sinalpha = sqrt(1.0-cosalpha*cosalpha);
float side = sinalpha * lengthC / (cosalpha*cosbeta - sinalpha*sqrt(1.0-cosbeta*cosbeta));

So far we have this:
vector2 C,N;
...
float lengthC = length(C);
//normalize(C); why be redundant?
C *= (1.0/lenghtC); //assuming per-component operator *= override
float cosbeta = dot(C,N);
float alpha = atan(x1) - atan(x);
float cosalpha = cos(alpha);
float sinalpha = sqrt(1.0-cosalpha*cosalpha);
float side = sinalpha * lengthC / (cosalpha*cosbeta - sinalpha*sqrt(1.0-cosbeta*cosbeta));
We traded one acos plus two sins for one cos plus two square roots. Not bad, especially if you are a religious person, or a cubist gardener.

The remaining trouble lies in the mysterious alpha; x1 is so called because is a slightly incremented x. Derivative, anyone?
So atan(x1) can be approximated by atan(x) + (d/dx(atan(x)) * (x1-x)), and the derivative of atan(x) is 1/(1+x2), hence
//float alpha = atan(x1) - atan(x); becomes
//float alpha = (atan(x) + 1.0/(1.0+x*x) * (x1-x)) - atan(x); which simplifies to

float alpha = (x1-x) / (1.0+x*x) 
and luckily, this concludes our exciting journey in the adrenaline-driven world of code optimization. Well, if you're not too shaken, there's always the perilous quest of part 1.