Compressing text

Elias Torres has a very interesting post about an investigation into compressing content stored in MySQL.

At Feedster, feed post content is compressed and stored in MySQL. When I was there, we were using MySQL 4.1.x which did not support compression natively, so we had to roll our own.

What we did was to use zlib to compress and store the compressed content if it was smaller than the original content. This is significant because some content compressed to a size larger than the original content. So when we extracted the content, we has to check the first two bytes and decompress the content if we found “\a120\a156” at the start of the file. We stored all our content in utf-8, and “\a120\a156” is not valid utf-8, so we were knew that we would not decompress content by mistake.

The decompression was done by whatever client accessed the data (an API in our case), and we generally found that this was not onerous to do.


Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: