Interestingly, there is no algorithm for finding exact nearest-neighbor in high-dimensional spaces (with 25 dimensions/features or more) that would be any more efficient than brute force search. There exists several algorithms that are able to identify approximate nearest neighbors, but at the onset of the Mobvis project (in 2005) none of these approximate algorithms performed any better than the brute force approach for high-dimensional visual features that we were trying to match in order to devise an augmented reality system. One of the advantages of working in academy is that you don't need to cut corners but you actually have time to attack problems head on. So I took the challenge of developing a novel approximate nearest neighbor search algorithm based on sparse coding with an overcomplete basis. The first step in developing a new algorithm is to evaluate existing algorithms. In my case, primary competitor was the brute force algorithm which is trivial to implement, since it is just a vector-matrix multiplication. Since modern microprocessors are heavily optimized for vector operations, the brute force algorithm is also brutally efficient. Even though my algorithm required 1000x less operations on a data set of 1M high-dimensional features, than the brute force algorithm, it was only 10x faster (which is barely enough to make up for the approximateness of my algorithm). It was a challenge I could not resist, so I've set on making the fastest possible implementation of my algorithm. I've implemented the algorithm in as low-level C as possible, but when I counted the number of operations per second that brute force algorithm managed to perform versus number of operations per second that my algorithm did, the difference was still hundredfold. Of course, the next step was implementing my algorithm in assembly language, but to my great surprise I couldn't even make it as fast as my C implementation. I did some development in assembly on 80386 processor in the early 90s. Since that time the complexity of microprocessors have increased tremendously and, as I have found out, nowadays it is impossible to craft better assembly code than what compiler is capable of producing, unless you have a very detailed knowledge of how the specific microprocessor executes operations and what caching strategies it uses.
From this experiment I have learned, that the lowest that a modern day programmer can get is the level of C. Anyone who claims that is capable of producing more efficient code than Intel C compiler is either misguided or is working for Intel.
Google+ The demise of the low level Programmer. When I started programming many of the elements we take for granted now, did not exist. There was no DirectX and not many compatible libs were available for the free compilers of the day.
- C/C++ Low Level Curriculum Part 7: More Conditionals (altdevblogaday.com)
- The "C is Efficient" Language Fallacy  (scienceblogs.com)
- Introduction to 64 Bit Intel Assembly Language Programming for Linux - Book Review (baptiste-wicht.com)