Google performance tools – part quatre

I have written three posts (part 1, part 2 & part 3) about the Google performance tools, at which time I claimed that I did not see much improvement from using the alternative memory allocator.

It turns out that my test was flawed. The test I ran was to create an in-memory index, a process which requires the allocation of lots of little bits of memory. Unfortunately the process runs as a single thread which is not the use case the alternative memory allocator is supposed to address. It is supposed to address the case where there are lots of threads running, all allocation memory.

So…

I set up and ran a different test. A search engine I have been playing around with here allows me to set it up either as a number of processes and/or a number of threads. So I took a chunk of data (2GB of text), created an index (2GB of index) and processed to run a large number of test, setting up the search engine with 20 instances with a single thread each, and one instance with 20 threads.

I am not going to publish the raw data here, it is not all that interesting, but I will describe what I noticed.

When running the search engine with 20 instances with a single thread each, the google alternative memory allocator actually delivers about 10% worse performance than the memory allocator in the glibc library with the search cache turned off, but about 10% better performance with the search cache turned on.

When running the search engine as one instance with 20 threads, the results are quite different. The google alternative memory allocator delivers about 10-15% better performance than the memory allocator in the glibc library with the search cache turned off, and 25-30% better performance with the search cache turned on.

What is also interesting is that the performance curve flattens out after peaking at 4 concurrent clients (the machine sports dual xeon processors with hyperthreading) when running 20 instances with a single thread each (peaking at 520 searches/sec, flattening out at 500 seaches/sec), while dropping and flattening out when running one instance with 20 threads (peaking out at 860 searches/second, flattening out at 570 searches/second).

So my results bear out that the google alternative memory allocator delivers when used in a threaded application.

Advertisements

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: