Number Encoding III

I decided to make the code for the number encoding benchmark available for download (see my earlier posts “Number Encoding” and “Number Encoding II”).

The code is basic C code and contains three tests:

  • A basic bitshift based encoding which is reasonably fast and where the data is portable across big-endian and little-endian machines.
  • A varint based encoding programmed to the spec issued by Google.
  • A compressed varint based encoding which is a variation on the varint based encoding. The difference is that I chose to represent zeros as a 0 bit byte, as opposed to an 8 bit byte which saves a byte when zero needs to be stored. When I checked some of the test collection indices, it turns out that about 10% of the data is zero, so this saved me some amount of space (less data means less I/O.)

You can download the code here.

You can compile the code in debug mode as follows:

gcc -g -o perf_test2 perf_test2.c

Or compile it in optimized mode as follows (-O3 provided the best optimizations):

gcc -O3 -o perf_test2 perf_test2.c

and run it thus:

./perf_test2

Let me know how it works for you, if you see any further optimizations, etc…

Advertisements

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: