Number Encoding III
January 8, 2010 Leave a comment
The code is basic C code and contains three tests:
- A basic bitshift based encoding which is reasonably fast and where the data is portable across big-endian and little-endian machines.
- A varint based encoding programmed to the spec issued by Google.
- A compressed varint based encoding which is a variation on the varint based encoding. The difference is that I chose to represent zeros as a 0 bit byte, as opposed to an 8 bit byte which saves a byte when zero needs to be stored. When I checked some of the test collection indices, it turns out that about 10% of the data is zero, so this saved me some amount of space (less data means less I/O.)
You can download the code here.
You can compile the code in debug mode as follows:
gcc -g -o perf_test2 perf_test2.c
Or compile it in optimized mode as follows (-O3 provided the best optimizations):
gcc -O3 -o perf_test2 perf_test2.c
and run it thus:
Let me know how it works for you, if you see any further optimizations, etc…