Packing bits

I was very interested to read this post by Kristian Nielsen on BIT-aligned storage.

He is comparing the time it takes to get numbers from memory, actually from memory and summing them, with the numbers in their native format, packed to a fixed number of bits and packed to a variable number of bits. Interestingly there was little difference between native format and packed to a fixed number of bits, which really makes sense to me, I have found that bit-shifting is a very cheap operation for modern CPUs.

This was also interesting to me because you usually need to store lots of numbers when you are doing full text indexing. While I have not found the need to pack numbers in memory, I always do so when storing them to disk and transmitting them over the network.

This has two benefits:

The first is that packed numbers are much quicker to read from disk (and write but that is less important) than unpacked numbers, especially if you have small numbers (which you will have if you use deltas as much as possible). This is really not new, I did my first tests on a Centris 610 running A/UX.

The second benefit is that you store the numbers in an architecture neutral format because you don’t have to deal with big-endian vs. little-endian differences. While this is less of an issue today because the i386/AMD64/EMT64 platform is pretty much ubiquitous, it was much more of an issue a decade ago.

The storage mechanism is pretty simple, you grab the lower 7 bits, store those and set the high bit if there are more bits to read, in code:

/* Masks for compressed numbers storage */
#define UTL_NUM_COMPRESSED_CONTINUE_BIT				(0x80)
#define UTL_NUM_COMPRESSED_DATA_MASK				(0x7F)
#define UTL_NUM_COMPRESSED_DATA_BITS				(7)


/* Macro to get the number of bytes occupied by a compressed 32 bit integer.
** The integer to evaluate should be set in uiMacroValue and the number of 
** bytes is placed in uiMacroByteCount
*/
#define UTL_NUM_GET_COMPRESSED_UINT_SIZE(uiMacroValue, uiMacroByteCount) \
	{	\
		if ( uiMacroValue < (1UL << 7) )	\
			uiMacroByteCount = 1;	\
		else if ( uiMacroValue < (1UL << 14) )	\
			uiMacroByteCount = 2;	\
		else if ( uiMacroValue < (1UL << 21) )	\
			uiMacroByteCount = 3;	\
		else if ( uiMacroValue < (1UL << 28) )	\
			uiMacroByteCount = 4;	\
		else	\
			uiMacroByteCount = 5;	\
	}

/* Macro for reading a compressed integer from memory. The integer is 
** read starting at pucMacroPtr and is stored in uiMacroValue
*/
#define UTL_NUM_READ_COMPRESSED_UINT(uiMacroValue, pucMacroPtr) \
	{	unsigned char ucMacroByte = '';	\
		ASSERT(pucMacroPtr != NULL);	\
		uiMacroValue = 0;	\
		do { 	\
			uiMacroValue = uiMacroValue << UTL_NUM_COMPRESSED_DATA_BITS;	\
			ucMacroByte = *(pucMacroPtr++);	\
			uiMacroValue += (ucMacroByte & UTL_NUM_COMPRESSED_DATA_MASK); 	\
		}	\
		while ( ucMacroByte & UTL_NUM_COMPRESSED_CONTINUE_BIT );	\
	}



/* Macro for compressing an unsigned integer and writing it to memory */
#define UTL_NUM_READ_COMPRESSED_UINT(uiMacroValue, pucMacroPtr) \
	{	\
		ASSERT(pucMacroPtr != NULL);	\
		for ( uiMacroValue = 0; uiMacroValue <= 0);	\
		ASSERT(pucMacroPtr != NULL);	\
		UTL_NUM_GET_COMPRESSED_UINT_SIZE(uiMacroLocalValue, uiMacroByteCount); \
		for ( uiMacroBytesLeft = uiMacroByteCount; uiMacroBytesLeft > 0; uiMacroBytesLeft-- ) { 	\
			ucMacroByte = (unsigned char)(uiMacroLocalValue & UTL_NUM_COMPRESSED_DATA_MASK);	\
			if ( uiMacroBytesLeft != uiMacroByteCount )	{ \
				ucMacroByte |= UTL_NUM_COMPRESSED_CONTINUE_BIT;	\
			}	\
			pucMacroPtr[uiMacroBytesLeft - 1] = ucMacroByte;	\
			uiMacroLocalValue >>= UTL_NUM_COMPRESSED_DATA_BITS;	\
		}	\
		pucMacroPtr += uiMacroByteCount; 	\
		ASSERT(uiMacroLocalValue == 0);	\
	}
 

I use macros here because they get called a *lot* and so are faster than functions (even inline ones I think.)

Advertisements

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: