<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>François Schiettecatte's Blog &#187; Software Development</title>
	<atom:link href="http://fschiettecatte.wordpress.com/category/software-development/feed/" rel="self" type="application/rss+xml" />
	<link>http://fschiettecatte.wordpress.com</link>
	<description>Thoughts from the edge of the 'net</description>
	<lastBuildDate>Thu, 07 Jan 2010 17:42:39 +0000</lastBuildDate>
	<generator>http://wordpress.com/</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<cloud domain='fschiettecatte.wordpress.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://www.gravatar.com/blavatar/d68519c10e24c7ac1d88ef92dbc2f1ac?s=96&#038;d=http://s.wordpress.com/i/buttonw-com.png</url>
		<title>François Schiettecatte's Blog &#187; Software Development</title>
		<link>http://fschiettecatte.wordpress.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://fschiettecatte.wordpress.com/osd.xml" title="François Schiettecatte&#8217;s Blog" />
		<item>
		<title>Google Open Source Projects</title>
		<link>http://fschiettecatte.wordpress.com/2010/01/02/google-open-source-projects/</link>
		<comments>http://fschiettecatte.wordpress.com/2010/01/02/google-open-source-projects/#comments</comments>
		<pubDate>Sat, 02 Jan 2010 19:56:26 +0000</pubDate>
		<dc:creator>François Schiettecatte</dc:creator>
				<category><![CDATA[Software Development]]></category>

		<guid isPermaLink="false">http://fschiettecatte.wordpress.com/?p=1239</guid>
		<description><![CDATA[Nice list of Google Open Source Projects.
Posted in Software Development       <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=fschiettecatte.wordpress.com&blog=642152&post=1239&subd=fschiettecatte&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p>Nice list of <a href="http://blog.0x1fff.com/2009/12/34-projekty-open-source-udostepnione.html">Google Open Source Projects</a>.</p>
Posted in Software Development  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/fschiettecatte.wordpress.com/1239/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/fschiettecatte.wordpress.com/1239/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/fschiettecatte.wordpress.com/1239/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/fschiettecatte.wordpress.com/1239/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/fschiettecatte.wordpress.com/1239/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/fschiettecatte.wordpress.com/1239/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/fschiettecatte.wordpress.com/1239/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/fschiettecatte.wordpress.com/1239/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/fschiettecatte.wordpress.com/1239/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/fschiettecatte.wordpress.com/1239/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=fschiettecatte.wordpress.com&blog=642152&post=1239&subd=fschiettecatte&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://fschiettecatte.wordpress.com/2010/01/02/google-open-source-projects/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/6a1c159367b376c46ec40efebed6798e?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">fschiettecatte</media:title>
		</media:content>
	</item>
		<item>
		<title>Why The Name NoSQL Is Meaningless (To Me)</title>
		<link>http://fschiettecatte.wordpress.com/2009/12/13/why-the-name-nosql-is-meaningless-to-me/</link>
		<comments>http://fschiettecatte.wordpress.com/2009/12/13/why-the-name-nosql-is-meaningless-to-me/#comments</comments>
		<pubDate>Sun, 13 Dec 2009 20:28:38 +0000</pubDate>
		<dc:creator>François Schiettecatte</dc:creator>
				<category><![CDATA[Scaling]]></category>
		<category><![CDATA[Software Development]]></category>

		<guid isPermaLink="false">http://fschiettecatte.wordpress.com/?p=1227</guid>
		<description><![CDATA[The &#8216;NoSQL&#8217; movement has gotten quite popular lately and with good reason, it is breaking new ground on distributed, scalable storage.
But the name &#8216;NoSQL&#8217; really bugs me, because SQL is just a query language, it is not a storage technology.  This is well illustrated in &#8220;InnoDB is a NoSQL database&#8221;, which I will quote [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=fschiettecatte.wordpress.com&blog=642152&post=1227&subd=fschiettecatte&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p>The <a href="http://en.wikipedia.org/wiki/NoSQL">&#8216;NoSQL&#8217;</a> movement has gotten quite popular lately and with good reason, it is breaking new ground on distributed, scalable storage.</p>
<p>But the name &#8216;NoSQL&#8217; really bugs me, because SQL is just a query language, it is not a storage technology.  This is well illustrated in <a href="http://www.xaprb.com/blog/2009/12/13/innodb-is-a-nosql-database/">&#8220;InnoDB is a NoSQL database&#8221;</a>, which I will quote below:</p>
<blockquote><p>
As long as the whole world is chasing this meaningless “NoSQL” buzzword, we should recognize that InnoDB is usable as an embedded database without an SQL interface. Hence, it is as much of a NoSQL database as anything else labeled with that term. And I might add, it is fast, reliable, and extremely well-tested in the real world. How many NoSQL databases have protection against partial page writes, for example?</p>
<p>It so happens that you can slap an SQL front-end on it, if you want: MySQL.
</p></blockquote>
<p>Another thing, it is probably better to say what you are for rather than what you are against, much more constructive. Time to get a new name/acronym I think.</p>
<p>Updated December 18th, 2009 &#8211; I am seeing that <a href="http://natishalom.typepad.com/nati_shaloms_blog/2009/12/the-common-principles-behind-the-nosql-alternatives.html?utm_source=feedburner&amp;utm_medium=feed&amp;utm_campaign=Feed%3A+NatiShalom+%28Nati+Shalom%27s+Blog%29">NoSQL is being renamed to mean Not Only SQL</a>, which I think is much better.</p>
Posted in Scaling, Software Development  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/fschiettecatte.wordpress.com/1227/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/fschiettecatte.wordpress.com/1227/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/fschiettecatte.wordpress.com/1227/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/fschiettecatte.wordpress.com/1227/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/fschiettecatte.wordpress.com/1227/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/fschiettecatte.wordpress.com/1227/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/fschiettecatte.wordpress.com/1227/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/fschiettecatte.wordpress.com/1227/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/fschiettecatte.wordpress.com/1227/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/fschiettecatte.wordpress.com/1227/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=fschiettecatte.wordpress.com&blog=642152&post=1227&subd=fschiettecatte&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://fschiettecatte.wordpress.com/2009/12/13/why-the-name-nosql-is-meaningless-to-me/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/6a1c159367b376c46ec40efebed6798e?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">fschiettecatte</media:title>
		</media:content>
	</item>
		<item>
		<title>exit() Rather Than free()</title>
		<link>http://fschiettecatte.wordpress.com/2009/12/10/exit-rather-than-free/</link>
		<comments>http://fschiettecatte.wordpress.com/2009/12/10/exit-rather-than-free/#comments</comments>
		<pubDate>Thu, 10 Dec 2009 18:30:14 +0000</pubDate>
		<dc:creator>François Schiettecatte</dc:creator>
				<category><![CDATA[Scaling]]></category>
		<category><![CDATA[Software Development]]></category>

		<guid isPermaLink="false">http://fschiettecatte.wordpress.com/?p=1225</guid>
		<description><![CDATA[I have to admit that I had a bit of a reaction to this post, apologies for quoting more than 50% of the post here but here goes:

See, developers are perfectionists, and their perfectionism also includes the crazy idea that all memory has to be deallocated at server shutdown, as otherwise Valgrind and other tools [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=fschiettecatte.wordpress.com&blog=642152&post=1225&subd=fschiettecatte&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p>I have to admit that I had a bit of a reaction to this <a href="http://mituzas.lt/2009/12/10/best-free-is-exit/">post</a>, apologies for quoting more than 50% of the post here but here goes:</p>
<blockquote><p>
See, developers are perfectionists, and their perfectionism also includes the crazy idea that all memory has to be deallocated at server shutdown, as otherwise Valgrind and other tools will complain that someone leaked memory. Developers will write expensive code in shutdown routines that will traverse every memory structure and deallocate/free() it.</p>
<p>Now, guess what would happen if they wouldn’t write all this expensive memory deallocation code.</p>
<p>Still guessing?</p>
<p>OS would do it for them, much much much faster, without blocking the shutdown for minutes or using excessive amounts of CPU. \o/
</p></blockquote>
<p>I am really uncomfortable with the approach of using free() for memory cleanup for the obvious reason that it is usually much, much cheaper to keep a process running than to shut it down and restart it on a regular basis. The other reason is that to rely on free() for memory cleanup is just poor hygiene.</p>
<p>Reminds me of the days of SunOS where common wisdom said that restarting a server once a week was a good idea to keep the memory leaks in check.</p>
Posted in Scaling, Software Development  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/fschiettecatte.wordpress.com/1225/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/fschiettecatte.wordpress.com/1225/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/fschiettecatte.wordpress.com/1225/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/fschiettecatte.wordpress.com/1225/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/fschiettecatte.wordpress.com/1225/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/fschiettecatte.wordpress.com/1225/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/fschiettecatte.wordpress.com/1225/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/fschiettecatte.wordpress.com/1225/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/fschiettecatte.wordpress.com/1225/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/fschiettecatte.wordpress.com/1225/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=fschiettecatte.wordpress.com&blog=642152&post=1225&subd=fschiettecatte&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://fschiettecatte.wordpress.com/2009/12/10/exit-rather-than-free/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/6a1c159367b376c46ec40efebed6798e?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">fschiettecatte</media:title>
		</media:content>
	</item>
		<item>
		<title>Chrome OS</title>
		<link>http://fschiettecatte.wordpress.com/2009/11/22/chrome-os/</link>
		<comments>http://fschiettecatte.wordpress.com/2009/11/22/chrome-os/#comments</comments>
		<pubDate>Sun, 22 Nov 2009 23:13:41 +0000</pubDate>
		<dc:creator>François Schiettecatte</dc:creator>
				<category><![CDATA[General]]></category>
		<category><![CDATA[Software Development]]></category>

		<guid isPermaLink="false">http://fschiettecatte.wordpress.com/?p=1205</guid>
		<description><![CDATA[I have been snowed under with work lately and was only able to spend a little time playing with Google&#8217;s new Chrome OS. 
I did find a very good review by Paul Thurrott though, well worth reading.
Posted in General, Software Development       <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=fschiettecatte.wordpress.com&blog=642152&post=1205&subd=fschiettecatte&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p>I have been snowed under with work lately and was only able to spend a little time playing with <a href="http://googleblog.blogspot.com/2009/07/introducing-google-chrome-os.html">Google&#8217;s new Chrome OS</a>. </p>
<p>I did find a <a href="http://www.winsupersite.com/alt/chromeos_preview2.asp">very good review by Paul Thurrott</a> though, well worth reading.</p>
Posted in General, Software Development  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/fschiettecatte.wordpress.com/1205/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/fschiettecatte.wordpress.com/1205/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/fschiettecatte.wordpress.com/1205/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/fschiettecatte.wordpress.com/1205/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/fschiettecatte.wordpress.com/1205/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/fschiettecatte.wordpress.com/1205/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/fschiettecatte.wordpress.com/1205/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/fschiettecatte.wordpress.com/1205/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/fschiettecatte.wordpress.com/1205/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/fschiettecatte.wordpress.com/1205/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=fschiettecatte.wordpress.com&blog=642152&post=1205&subd=fschiettecatte&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://fschiettecatte.wordpress.com/2009/11/22/chrome-os/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/6a1c159367b376c46ec40efebed6798e?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">fschiettecatte</media:title>
		</media:content>
	</item>
		<item>
		<title>Fedora 12</title>
		<link>http://fschiettecatte.wordpress.com/2009/11/19/fedora-12/</link>
		<comments>http://fschiettecatte.wordpress.com/2009/11/19/fedora-12/#comments</comments>
		<pubDate>Thu, 19 Nov 2009 15:05:36 +0000</pubDate>
		<dc:creator>François Schiettecatte</dc:creator>
				<category><![CDATA[Software Development]]></category>

		<guid isPermaLink="false">http://fschiettecatte.wordpress.com/?p=1200</guid>
		<description><![CDATA[Just upgraded to Fedora 12, much nicer than Fedora 11.
And the upgrade process was smooth which was welcome.
Posted in Software Development       <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=fschiettecatte.wordpress.com&blog=642152&post=1200&subd=fschiettecatte&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p>Just upgraded to <a href="http://fedoraproject.org/">Fedora 12</a>, much nicer than Fedora 11.</p>
<p>And the upgrade process was smooth which was welcome.</p>
Posted in Software Development  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/fschiettecatte.wordpress.com/1200/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/fschiettecatte.wordpress.com/1200/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/fschiettecatte.wordpress.com/1200/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/fschiettecatte.wordpress.com/1200/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/fschiettecatte.wordpress.com/1200/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/fschiettecatte.wordpress.com/1200/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/fschiettecatte.wordpress.com/1200/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/fschiettecatte.wordpress.com/1200/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/fschiettecatte.wordpress.com/1200/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/fschiettecatte.wordpress.com/1200/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=fschiettecatte.wordpress.com&blog=642152&post=1200&subd=fschiettecatte&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://fschiettecatte.wordpress.com/2009/11/19/fedora-12/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/6a1c159367b376c46ec40efebed6798e?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">fschiettecatte</media:title>
		</media:content>
	</item>
		<item>
		<title>InnoDB Compression</title>
		<link>http://fschiettecatte.wordpress.com/2009/11/19/innodb-compression/</link>
		<comments>http://fschiettecatte.wordpress.com/2009/11/19/innodb-compression/#comments</comments>
		<pubDate>Thu, 19 Nov 2009 15:03:18 +0000</pubDate>
		<dc:creator>François Schiettecatte</dc:creator>
				<category><![CDATA[Scaling]]></category>
		<category><![CDATA[Software Development]]></category>

		<guid isPermaLink="false">http://fschiettecatte.wordpress.com/?p=1193</guid>
		<description><![CDATA[I had a few hours to spare a couple of days ago and decided to check InnoDB Plugin 1.0&#8217;s support for data compression.
In a project I work on from time to time, there is a table which contains three blobs which contains text data. To store the data efficiently I was using the COMPRESS() function [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=fschiettecatte.wordpress.com&blog=642152&post=1193&subd=fschiettecatte&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p>I had a few hours to spare a couple of days ago and decided to check <a href="http://www.innodb.com/doc/innodb_plugin-1.0/index.html">InnoDB Plugin 1.0&#8217;s</a> support for <a href="http://www.innodb.com/doc/innodb_plugin-1.0/innodb-compression.html">data compression</a>.</p>
<p>In a project I work on from time to time, there is a table which contains three blobs which contains text data. To store the data efficiently I was using the <a href="http://dev.mysql.com/doc/refman/5.1/en/encryption-functions.html#function_compress">COMPRESS()</a> function in MySQL and doing a &#8220;<a href="http://dev.mysql.com/doc/refman/5.0/en/charset-convert.html">CONVERT</a>(<a href="http://dev.mysql.com/doc/refman/5.1/en/encryption-functions.html#function_uncompress">UNCOMPRESS</a>(text) AS utf8)&#8221; to uncompress the data and present it as utf8. No problems there, but with the recent move to the InnoDB Plugin 1.0 in MySQL 5.1 there was an opportunity to push that down the stack.</p>
<p>I ran a few benchmarks and it turned out that using 8K pages was the optimal trade-off between space and time. Using 16K pages did not compress the data very well, and using pages smaller than 8K increased the time needed to store the data. I should note that 8K is also the default.</p>
<p>There are some interesting wrinkles in all this, innodb_file_per_table needs to be enabled and I think the innodb_file_format needs to be set to &#8216;barracuda&#8217; thought I am not sure about that.</p>
Posted in Scaling, Software Development  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/fschiettecatte.wordpress.com/1193/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/fschiettecatte.wordpress.com/1193/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/fschiettecatte.wordpress.com/1193/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/fschiettecatte.wordpress.com/1193/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/fschiettecatte.wordpress.com/1193/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/fschiettecatte.wordpress.com/1193/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/fschiettecatte.wordpress.com/1193/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/fschiettecatte.wordpress.com/1193/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/fschiettecatte.wordpress.com/1193/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/fschiettecatte.wordpress.com/1193/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=fschiettecatte.wordpress.com&blog=642152&post=1193&subd=fschiettecatte&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://fschiettecatte.wordpress.com/2009/11/19/innodb-compression/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/6a1c159367b376c46ec40efebed6798e?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">fschiettecatte</media:title>
		</media:content>
	</item>
		<item>
		<title>Number Encoding II</title>
		<link>http://fschiettecatte.wordpress.com/2009/10/16/number-encoding-ii/</link>
		<comments>http://fschiettecatte.wordpress.com/2009/10/16/number-encoding-ii/#comments</comments>
		<pubDate>Fri, 16 Oct 2009 15:32:40 +0000</pubDate>
		<dc:creator>François Schiettecatte</dc:creator>
				<category><![CDATA[Scaling]]></category>
		<category><![CDATA[Search]]></category>
		<category><![CDATA[Software Development]]></category>

		<guid isPermaLink="false">http://fschiettecatte.wordpress.com/?p=1166</guid>
		<description><![CDATA[To conclude my little foray into number encoding (see the presentation by Jeffrey Dean from Google titled “Challenges in Building Large-Scale Information Retrieval Systems” (video, slides)), here are a few conclusions:

In terms of raw performance the &#8220;Varint Encoding&#8221; is much faster than “Byte-Aligned Variable-length Encodings” and I was able to get better numbers than Google [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=fschiettecatte.wordpress.com&blog=642152&post=1166&subd=fschiettecatte&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p>To conclude my little foray into <a href="http://fschiettecatte.wordpress.com/2009/09/29/number-encoding/">number encoding</a> (see the presentation by Jeffrey Dean from Google titled “Challenges in Building Large-Scale Information Retrieval Systems” (<a href="http://videolectures.net/wsdm09_dean_cblirs/">video</a>, <a href="http://research.google.com/people/jeff/WSDM09-keynote.pdf">slides</a>)), here are a few conclusions:</p>
<ul>
<li>In terms of raw performance the &#8220;Varint Encoding&#8221; is much faster than “Byte-Aligned Variable-length Encodings” and I was able to get better numbers than Google got, most likely because I am using a different machine. It would be interesting to know what kind of machine/OS they used for their timings so I could do a direct comparison. My lookup array structure is different (and more compact) than Google&#8217;s, assuming I understood Google&#8217;s lookup array structure in the presentation.
</li>
<li>The “Byte-Aligned Variable-length Encodings” is faster if you are storing three numbers per posting, namely a document ID, a term position and a field ID. The “Group Varint Encoding” is faster if you are storing four number per posting, namely a document ID, a term position, a field ID and a weight.
</li>
<li>As I described in the <a href="http://fschiettecatte.wordpress.com/2009/09/29/number-encoding/#comment-4358">last comment in the original post</a>, two bits are used in the header for each varint to indicate its size in bytes, so 0, 1, 2 or 3 indicate whether your varint is 1, 2, 3 or 4 bytes long respectively. However if you store deltas a lot of the numbers you store will be 0, using a byte to store 0 seems wasteful to me. So I changed this so that the two bits indicate the actual number of bytes in the varint, and 0 bytes means 0. This way I don’t actually allocate space unless there is a value other than 0 to store. This saves about 10% in my overall index size, and a lot more if you only take the term postings into account because I store some amount of document metadata in my index. Of course this means that you can’t store a number greater than 16,777,216 which won’t happen unless you are creating huge indices with more than 16,777,216 documents in them or have documents longer that 16,777,216 terms.
</li>
</ul>
<p>Basically it comes down to trade-offs, index compactness vs. decode speed, and looking at speed both in test code (usually a contrived example) and performance on a real data set. I used the Wikipedia data for that along with 200 relatively complex searches designed to read lots of postings lists.</p>
Posted in Scaling, Search, Software Development  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/fschiettecatte.wordpress.com/1166/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/fschiettecatte.wordpress.com/1166/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/fschiettecatte.wordpress.com/1166/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/fschiettecatte.wordpress.com/1166/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/fschiettecatte.wordpress.com/1166/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/fschiettecatte.wordpress.com/1166/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/fschiettecatte.wordpress.com/1166/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/fschiettecatte.wordpress.com/1166/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/fschiettecatte.wordpress.com/1166/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/fschiettecatte.wordpress.com/1166/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=fschiettecatte.wordpress.com&blog=642152&post=1166&subd=fschiettecatte&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://fschiettecatte.wordpress.com/2009/10/16/number-encoding-ii/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/6a1c159367b376c46ec40efebed6798e?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">fschiettecatte</media:title>
		</media:content>
	</item>
		<item>
		<title>Danger Danger</title>
		<link>http://fschiettecatte.wordpress.com/2009/10/13/danger-danger/</link>
		<comments>http://fschiettecatte.wordpress.com/2009/10/13/danger-danger/#comments</comments>
		<pubDate>Wed, 14 Oct 2009 01:12:41 +0000</pubDate>
		<dc:creator>François Schiettecatte</dc:creator>
				<category><![CDATA[General]]></category>
		<category><![CDATA[Scaling]]></category>
		<category><![CDATA[Software Development]]></category>

		<guid isPermaLink="false">http://fschiettecatte.wordpress.com/?p=1164</guid>
		<description><![CDATA[Plenty has been written about the Danger data loss over the weekend (TechCrunch). For me the most interesting commentary came from John C. Dvorak, he got some things right but he also got some things wrong:

Over the past week, users of the T-Mobile Sidekick platform found that all their contacts and other important information was [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=fschiettecatte.wordpress.com&blog=642152&post=1164&subd=fschiettecatte&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p>Plenty has been written about the Danger data loss over the weekend (<a href="http://www.techcrunch.com/2009/10/10/t-mobile-sidekick-disaster-microsofts-servers-crashed-and-they-dont-have-a-backup/">TechCrunch</a>). For me <a href="http://www.pcmag.com/article2/0,2817,2354118,00.asp?kc=PCRSS03079TX1K0000584">the most interesting commentary came from John C. Dvorak</a>, he got some things right but he also got some things wrong:</p>
<blockquote><p>
Over the past week, users of the T-Mobile Sidekick platform found that all their contacts and other important information was permanently lost, because of server mishaps. If Microsoft had wanted to throw a monkey wrench into cloud computing, it could not have done a better job.
</p></blockquote>
<p>Huh, don&#8217;t think so, this was a data loss screw-up, nothing to do with the cloud. If what we are reading is to be believed, a SAN upgrade went wrong and the data was lost, with no backup.</p>
<p>Insert <a href="http://video.google.com/videoplay?docid=-7872246776955336366#">Ellen Feiss ad</a> here&#8230;</p>
<p>Seriously though backups are essential because things will go wrong. Note that I use &#8216;will&#8217; and not &#8216;may&#8217;, and these may or may not be under our control.</p>
<p>The other things I used to tell clients is to do a fire drill on a regular basis, by that I mean taking the backups and making sure they can be restored properly and completely. I had one client who discovered that all their backups were useless when they checked.</p>
Posted in General, Scaling, Software Development  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/fschiettecatte.wordpress.com/1164/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/fschiettecatte.wordpress.com/1164/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/fschiettecatte.wordpress.com/1164/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/fschiettecatte.wordpress.com/1164/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/fschiettecatte.wordpress.com/1164/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/fschiettecatte.wordpress.com/1164/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/fschiettecatte.wordpress.com/1164/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/fschiettecatte.wordpress.com/1164/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/fschiettecatte.wordpress.com/1164/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/fschiettecatte.wordpress.com/1164/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=fschiettecatte.wordpress.com&blog=642152&post=1164&subd=fschiettecatte&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://fschiettecatte.wordpress.com/2009/10/13/danger-danger/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/6a1c159367b376c46ec40efebed6798e?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">fschiettecatte</media:title>
		</media:content>
	</item>
		<item>
		<title>Number Encoding</title>
		<link>http://fschiettecatte.wordpress.com/2009/09/29/number-encoding/</link>
		<comments>http://fschiettecatte.wordpress.com/2009/09/29/number-encoding/#comments</comments>
		<pubDate>Tue, 29 Sep 2009 23:47:22 +0000</pubDate>
		<dc:creator>François Schiettecatte</dc:creator>
				<category><![CDATA[Scaling]]></category>
		<category><![CDATA[Search]]></category>
		<category><![CDATA[Software Development]]></category>

		<guid isPermaLink="false">http://fschiettecatte.wordpress.com/?p=1142</guid>
		<description><![CDATA[A while back I came across this very interesting presentation by Jeffrey Dean from Google titled &#8220;Challenges in Building Large-Scale Information Retrieval Systems&#8221; (video, slides) where he talks about the various challenges that Google ran into over time as they scaled up and how they solved them.
This abstract sets the stage pretty well
Building and operating [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=fschiettecatte.wordpress.com&blog=642152&post=1142&subd=fschiettecatte&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p>A while back I came across this very interesting presentation by Jeffrey Dean from Google titled &#8220;Challenges in Building Large-Scale Information Retrieval Systems&#8221; (<a href="http://videolectures.net/wsdm09_dean_cblirs/">video</a>, <a href="http://research.google.com/people/jeff/WSDM09-keynote.pdf">slides</a>) where he talks about the various challenges that Google ran into over time as they scaled up and how they solved them.</p>
<p>This abstract sets the stage pretty well</p>
<blockquote><p>Building and operating large-scale information retrieval systems used by hundreds of millions of people around the world provides a number of interesting challenges. Designing such systems requires making complex design tradeoffs in a number of dimensions, including (a) the number of user queries that must be handled per second and the response latency to these requests, (b) the number and size of various corpora that are searched, (c) the latency and frequency with which documents are updated or added to the corpora, and (d) the quality and cost of the ranking algorithms that are used for retrieval.</p>
<p>In this talk I&#8217;ll discuss the evolution of Google&#8217;s hardware infrastructure and information retrieval systems and some of the design challenges that arise from ever-increasing demands in all of these dimensions. I&#8217;ll also describe how we use various pieces of distributed systems infrastructure when building these retrieval systems.</p>
<p>Finally, I&#8217;ll describe some future challenges and open research problems in this area.</p></blockquote>
<p>What particularly appealed to my (gnarly) mind were the various encoding algorithms they tried to encode numbers in their index (pages 54 to 63 of the <a href="http://research.google.com/people/jeff/WSDM09-keynote.pdf">slides</a>).</p>
<p>Having written a search engine I was particularly interested in how they approached that. I actually settled for what they call &#8220;Byte-Aligned Variable-length Encodings&#8221; (slide 56) ten years ago having done a bunch of benchmarking (which I repeated every few years to see how well it fared on newer generations of processors.) The encoding is very compact and does very well when you have powerful CPUs and weak IO. I did some revisions over time in the C code which drives it to steer the optimizer, but it has held up well. I used this encoding for the Feedster search engine. As an aside one of the pluses of this encoding is that it is endian independent.</p>
<p>So I was very interested by the &#8220;Group Varint Encoding&#8221; (slide 63) specifically the claim that the decode time was faster.</p>
<p>Google was able to achieve a decode speed of ~180M numbers/second for the Byte-Aligned Variable-length Encodings and ~400M numbers/second for the Group Varint Encoding.</p>
<p>So I set out to replicate the results and ran into some interesting things.</p>
<p>The machines used for this are an Intel DX58SO motherboard with a Core i7 920 CPU, 2.66GHz, 6GB of RAM, Fedora 11, gcc 4.1.1, and a Mac Pro with 2 Dual Core Xeons, 2.66GHz, 13GB of RAM, Mac OS X 10.5.8, gcc 4.2.1.</p>
<p>The test code is pretty simple, one test writes and reads number from the same area of memory and the second test writes and reads a sequence of numbers over 1GB of memory. I added in verification code to check that the number read was the same as the number written to simulate some in between reading the data.</p>
<p>My results for Byte-Aligned Variable-length Encodings were better, I was able to get ~226M numbers/second on the DX58SO and ~190M numbers/second on the Mac Pro. Removing the verification code I was able to get ~426M and ~405M numbers/second respectively.</p>
<p>And the results for the Group Varint Encoding were much better, I was able to get ~680M numbers/second on the DX58SO and ~500M numbers/second on the Mac Pro. Removing the verification code made reading pretty much instantaneous (less than 1 microsecond to read 4.8B numbers.)</p>
<p>UPDATED October 9th, 2009 &#8211; Obviously my own BS meter was off when I wrote that last paragraphs, as I point out in the comments, the optimizer was doing away with most of the code because the results were not used. Preventing it from doing that gave me the same results as the ones I got with the verification code in place and was certainly not instantaneous.</p>
Posted in Scaling, Search, Software Development  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/fschiettecatte.wordpress.com/1142/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/fschiettecatte.wordpress.com/1142/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/fschiettecatte.wordpress.com/1142/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/fschiettecatte.wordpress.com/1142/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/fschiettecatte.wordpress.com/1142/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/fschiettecatte.wordpress.com/1142/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/fschiettecatte.wordpress.com/1142/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/fschiettecatte.wordpress.com/1142/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/fschiettecatte.wordpress.com/1142/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/fschiettecatte.wordpress.com/1142/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=fschiettecatte.wordpress.com&blog=642152&post=1142&subd=fschiettecatte&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://fschiettecatte.wordpress.com/2009/09/29/number-encoding/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/6a1c159367b376c46ec40efebed6798e?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">fschiettecatte</media:title>
		</media:content>
	</item>
		<item>
		<title>Boston MySQL User Group Meeting Videos</title>
		<link>http://fschiettecatte.wordpress.com/2009/08/03/boston-mysql-user-group-meeting-videos/</link>
		<comments>http://fschiettecatte.wordpress.com/2009/08/03/boston-mysql-user-group-meeting-videos/#comments</comments>
		<pubDate>Tue, 04 Aug 2009 01:39:27 +0000</pubDate>
		<dc:creator>François Schiettecatte</dc:creator>
				<category><![CDATA[Software Development]]></category>

		<guid isPermaLink="false">http://fschiettecatte.wordpress.com/?p=1123</guid>
		<description><![CDATA[There is a MySQL User Group in the Boston area, which I keep meaning to attend but never get to go because of time pressures. So I was happy to see that they are making meeting videos available on YouTube.
Posted in Software Development       <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=fschiettecatte.wordpress.com&blog=642152&post=1123&subd=fschiettecatte&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p>There is a <a href="http://www.meetup.com/mysqlbos/">MySQL User Group in the Boston area</a>, which I keep meaning to attend but never get to go because of time pressures. So I was happy to see that they are making <a href="http://www.youtube.com/view_play_list?p=7BDA30AED87AC2F9">meeting videos available on YouTube</a>.</p>
Posted in Software Development  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/fschiettecatte.wordpress.com/1123/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/fschiettecatte.wordpress.com/1123/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/fschiettecatte.wordpress.com/1123/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/fschiettecatte.wordpress.com/1123/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/fschiettecatte.wordpress.com/1123/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/fschiettecatte.wordpress.com/1123/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/fschiettecatte.wordpress.com/1123/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/fschiettecatte.wordpress.com/1123/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/fschiettecatte.wordpress.com/1123/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/fschiettecatte.wordpress.com/1123/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=fschiettecatte.wordpress.com&blog=642152&post=1123&subd=fschiettecatte&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://fschiettecatte.wordpress.com/2009/08/03/boston-mysql-user-group-meeting-videos/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/6a1c159367b376c46ec40efebed6798e?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">fschiettecatte</media:title>
		</media:content>
	</item>
	</channel>
</rss>