Optimization, profiling and telemetry

I recently had an interesting conversation with a colleague about optimization, profiling and telemetry as they relate to application performance.

About optimization, Donald Knuth said:

“We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.” (Knuth, Donald. Structured Programming with go to Statements, ACM Journal Computing Surveys, Vol 6, No. 4, Dec. 1974. p.268.)

This makes perfect sense since you don’t know were the bottleneck are going to be before building a system. On the other hand, it does not mean that you should throw performant algorithms out of the window. You have to think performance at a local level when you write your code.

Profiling the application means running it with preset data and inputs to see where the performance bottlenecks are, at which point you can optimize the application. A profile will tell you where time is being spent, down to the function, and sometimes down to the line. I have found useful to always reuse the same set of data and inputs so I can track the performance effects of changes. Of course you need to make sure that the data set an inputs provide good code coverage.

Profiling should also give you a good idea of how long it takes to perform operations. This becomes important when you take your system to production, and gather telemetry.

Telemetry (credit goes to Dan Pritchett for the term) is the collection of data on a running application, tracking how long certain operations take, getting data from a database or running a search for example. This data can be collected and monitored.

For example if you have to contact a number of web services to construct a web page, you should capture the time each of those services take. That way you will be able to see the maximum, minimum, mean and median times it takes for each of those services to run. By monitoring this data you will be able to tell when a service is underperforming and take action.

I want two things on my Mac

There are two things I want on my Mac, but I can’t find an easy way to do them (by easy, I mean not doing any coding).

The first one is to have quicklook open up xml documents. To me it seems to be quite an oversight not to have that.

The second is a little more involved. I would like to be able to add stationary documents to my contextual menu. What I envision is the right click on the mouse (in the Finder) giving me a list of possible documents to create in the selected Finder window. This list of documents would be constructed from document I put in a folder somewhere, in the “Contextual Menu Items” folder for example.

Microsoft Office 2008

Two interesting reviews of Microsoft Office 2008 for the Mac, one from Paul Thurrott and the other from MacInTouch.I have also had a peek at the software and it looks pretty good.

I have not upgraded my version of Microsoft Office for a while and I think I will do it this time.

MacWorld has finally wrapped up

MacWorld has finally wrapped up, and now that the heady excitement of Steve Jobs’ keynote has passed, here is my 2 cent’s worth on that was announced.

To me, by far the most interesting announcement is the AppleTV take 2.0. I don’t think there was any value on buying movies online, the prices were no better than buying the actual DVD, the quality was not there and all the extras were missing. There just wasn’t any value for the consumer and it showed in the number of AppleTVs sold and probably in the number of movies sold. Going for a rental model makes much more sense (as we can see from all the other rental options out there). I think the price is competitive, and we shall have to see what the quality looks like. One wrinkle though, I wonder how much of a long tail there will be. I don’t watch the bulk of new releases and I tend to rent from the long tail. If those movies are not made available, I am not likely to cancel my netflix account anytime soon.

The second most interesting announcement is the MacBook Air. I have not seen one in person, but I think it looks very attractive. Lots has been written about the compromises that had to be made to build it, but to me it is very attractive for a number of reasons. I already have a MacBook Pro, I only use wireless networking, I don’t usually have more than 50GB of data on it, and I hardly ever use the internal DVD drive, all it is is a mobile extension of my MacPro. So I am in the target demographic for the MacBook Air. While I probably won’t be buying one just yet, I will likely be doing so when I come to replace my MacBook Pro.

The third most interesting announcement is Time Capsule . I think integrating a drive directly into the Airport Express Base Station makes a lot of sense, and I think that this would make a very good, simple, backup device for the home. However for the prosumer I think it falls a little short. While the USB port is still there, it is too slow for attaching any kind of respectable storage. What I would have liked to see is either a Firewire 800 port, or an eSATA port, so I would connect a storage appliance to it.

More on “The Database Column vs. MapReduce”

Greg Linden has published an interesting analysis of the column in The Database Column comparing MapReduce with DBMSs.

Most relevantly, he notes:

The most compelling part of the post for me is their argument that some algorithms require random access to data, something that is not well supported by GFS, and it is not always easy or efficient to restructure those algorithms primarily to do sequential scans.

Which I would fully agree with. If you require random access to data, MapReduce is not likely to be the best tool to use and they realize that.

Singapore contest

The state of Singapore is organizing a contest, with a $100,000 prize, to develop a search engine.

I think it will take more than that to get anyone serious involved, maybe a prize of $1,000,000 or more.

Not so Interesting Items

I am getting some pretty weird stuff in the “Interesting Items” section of my Google search history.

Sometimes I only get one related search along with a very odd list of related pages. Right now I am getting a very odd stuff in both lists.

I wonder what is going on there…