Time to clean house

Periodically I have to clean house, meaning that I close down all those online accounts which I opened and never used, unsubscribe from lists that are no longer useful, cut out RSS feeds I don’t read, ‘unfriend’ people I never talk to, and go through all the computer gear I have amassed and get rid of unwanted and unused stuff.

Now is that time.

UPDATED – 7/30/08 – Cleaned up a lot of stuff, got one taker on craigslist for all of it and then nothing more is heard from them – unbelievable – I guess free means the person does not value the stuff.

When not to send a drive in even if it is under warranty

One of my drives failed over the weekend, it was the main drive on the the main computer (I had a current backup so lost no data,) but I ran into the issue of whether to send the drive in under warranty or not. The drive is an expensive 150GB WD Raptor which spin at 10,000RPM (it does make a difference,) so I would like to get it fixed under the warranty. The problem is that the drive won’t even spin up so I can’t wipe it.

So it won’t get sent it because of the data that is on it, way too much personal stuff there. I will use it as a paperweight for a while, take it apart to satisfy my inner geek and then take a hammer to it.

Grouper getting a make-over

Last week I posted something small, so this week it is time for something big (I try to alternate.)

This is a Nassau Grouper getting a make-over from some Peterson Cleaner Shrimp at a cleaning station. If you look carefully you can see the shrimp in its mouth.

A cleaning station is a place where fish (and other creatures) can come and get a cleaning from resident shrimps and other cleaning fishes. The cleaning consists of removing dead skin and flesh, as well as any parasites that may be on the fish getting cleaned.

A reef will have cleaning stations all over, you only need to look where fish are hovering, usually at a slight angle, with their mouths and/or gill open. This is usually the signal that they want a cleaning rather than hunting for dinner!

On this particular shot I was able to loiter for about 5 minutes, taking lots of shots as the shrimp clambered all over the grouper to clean it. Cleaning stations are a great place to get good shots because the fish are usually very relaxed and don’t move, so if you are patient you will be usually rewarded with great shots.

The oddity here is that I could not see the Corkscrew Anemone where Peterson Cleaner Shrimp usually hide in, but maybe it was tucked away somewhere out of sight.

Peterson Cleaner Shrimp will also clean diver’s hands if you let them which I have done before and which will be the subject of a future post.

“stuck with a multilanguage future”

At the end of a fairly predictable article where SOAP and REST supporters take cheap shots at each other, Tim Bray being one of them, Bray comes out with some pretty eye-rolling stuff:

During a keynote presentation at OSCON on Friday, Bray will talk about the “language inflection point,” in which various languages such as Perl, Python, and Ruby have been gathering momentum at the expense of the established Java and .Net platforms.

“Up until two years ago, if you were a serious programmer you wrote code in either Java or .Net,” Bray said. “[Now], there are all these options that people are looking at and it’s really an inflection point.”

I fail to see what “serious programmer” and specific languages have to do with each other, I would have thought that a “serious programmer” would pick the language best suited to the task at hand.

The Java platform is accommodating scripting languages such as Ruby and Python on the JVM, Bray noted. Sun has been enabling these to work on the Java Virtual Machine. “The Java language is not what the cool kids are choosing to use these days,” said Bray.

IMHO the “cool kids” who are really smart learn a variety of languages and keep learning new ones. They do this to increase the breadth of their knowledge and toolbox, so they don’t approach every programming problem with the same hammer.

Still, Java will stay around, he said. “The Java language isn’t going away. It’s the world’s most popular programming language,” Bray said.

I have not seen any specific figures as to how popular a specific language is, in fact how would you measure that. Lines written? Programmers using it? Users using application written in it?

“I think that like it or not, we’re stuck with a multilanguage future,” he stressed.

What’s not to like about a “multilanguage future”, we have a multilanguage present and we have had a multilanguage past, multilanguage has served us well and will continue to do so. As for being “stuck”, I am glad we were not “stuck” 30 years ago otherwise we would all be writing stuff in COBOL, or worse assembler.

Landlines disappearing near you soon

Came across this article on dvorak.org/blog about the decline in landlines across the US:

With millions of Americans snapping up the iPhone, AT&T, the exclusive U.S. carrier for the popular phone, should be quite pleased with the stream of revenue it can expect from customers.

But AT&T, the biggest telecommunications company in the United States, has a problem: analysts say consumers are dropping traditional landlines faster than expected. The company, which still gets 32 percent of its revenue from its landline business, was scheduled to report its second-quarter financial results Wednesday and was expected to talk about how its traditional phone service is contracting.

AT&T is not the only company facing a changing environment in the communications business. All of the major U.S. telecommunications companies – AT&T, Verizon and Sprint Nextel – are figuring out how to make more money from customers as they spend more time sending text messages or browsing the Web on their cellphones, rather than talking.

I don’t think this has anything to do with the iPhone, but it has everything to do with the fact that cell phones have replaced landlines almost completely, making them completely redundant, and with the fact that Skype is so cheap. Consumers are using cell phones for the convenience and Skype (or whatever system) for the (lack of) cost. A landline costs and is inconvenient because you can’t take it with you.

And I don’t buy the argument that the cell phone companies are having a hard time making money, we pay for the minutes whether we use them or not, and the minimum cell phone plan costs four times what the minimum landline plan costs, not to mention the contract the penalties for early termination. Finally deploying cell phone infrastructure is a lot cheaper than deploying landlines, you only have to look at Europe and Asia to see that.

The business of telephony is changing, out with the old and in with the new.

Apple iPhone as a keyboard remote for AppleTV

I was very happy to see that you could use the iPhone Remote application as a remote for your AppleTV as well as for iTunes, but you can also use it as a keyboard when the AppleTV needs text input, when searching for movies for example (found on Daring Fireball, via MacRumors.com.

Very cool indeed, I have been wanting something like that for a while.

Language recognition

Recently I needed a language recognition library to identify the language of specific chunks of text. I asked a network of colleagues here in the Boston area and they came up with the following:

  1. LingPipe
  2. TextCat
  3. Simile

There is also:

  1. Lingua::Identify
  2. The language identifier in Nutch

And all this led to:

  1. TCatNG Toolkit
  2. TextCat derivatives and current home in SpamAssassin

In the event it seemed simple enough to write my own using the text collection in TextCat as source material for the ngrams and associated frequencies.

As for a corpus, I stumbled onto this project: “Corpus building for minority languages” which led to a status page, which lead to the Declaration of Human Rights in 335 languages.

Flamingo Tongue

This little guy is a Flamingo Tongue. The coloring on it is not the shell but a protective mantle that is deployed when they are awake, the shell itself is white. They feed on coral which is where they are usually seen.

Taking pictures of these guys can be challenging since they are usually seen on soft corals which sway with the surge/current so they are usually moving back and forth. I have lots of out of focus pictures of them, and relatively few in-focus pictures.

The mantle is really quite beautiful and I probably serves as a warning to potential predators, but that is just my own theory.

I have another picture of one here.

To normalize or not to normalize, that is the question

Jeff Atwood published a very interesting post “Maybe Normalizing Isn’t Normal” where he delves into whether you should normalize or denormalize. Be sure to check the comments, or this summary on the High Scalability blog if the number of comments (and tone in some cases) gives you a headache.

The post is very interesting but I took issue with this:

Both solutions have their pros and cons. So let me put the question to you: which is better — a normalized database, or a denormalized database?

Trick question! The answer is that it doesn’t matter! Until you have millions and millions of rows of data, that is. Everything is fast for small n. Even a modest PC by today’s standards — let’s say a dual-core box with 4 gigabytes of memory — will give you near-identical performance in either case for anything but the very largest of databases. Assuming your team can write reasonably well-tuned queries, of course.

While it is true that for small data sets there is no difference in performance whether you normalize you schema or not, it will make a huge difference once your data set grows. Adding to the fun is that changing your schema becomes more and more difficult as the data set grows.

Then things settle down:

First, a reality check. It’s partially an act of hubris to imagine your app as the next Flickr, YouTube, or Twitter. As Ted Dziuba so aptly said, scalability is not your problem, getting people to give a shit is. So when it comes to database design, do measure performance, but try to err heavily on the side of sane, simple design. Pick whatever database schema you feel is easiest to understand and work with on a daily basis. It doesn’t have to be all or nothing as I’ve pictured above; you can partially denormalize where it makes sense to do so, and stay fully normalized in other areas where it doesn’t.

A sane, simple design is a “good thing”, but you also need to plan for the future, you want a sane simple design which can evolve and scale.

Finally sanity is restored:

Pat Helland notes that people normalize because their professors told them to. I’m a bit more pragmatic; I think you should normalize when the data tells you to:

  1. Normalization makes sense to your team.
  2. Normalization provides better performance. (You’re automatically measuring all the queries that flow through your software, right?)
  3. Normalization prevents an onerous amount of duplication or avoids risk of synchronization problems that your problem domain or users are particularly sensitive to.
  4. Normalization allows you to write simpler queries and code.

In my experience (with Feedster amongst others), a heavily denomalized schema is easy to work with but simply does not scale well.

With my current project I took a different tack:

  • Normalize where it makes sense and group logical chunks of data together, even if it means having 1 to 1 relationships. From a performance point of view this means that you get and update the chunks you need rather than accessing tables with 50+ fields were 90% of the fields are null (don’t laugh, I have seen it happen).
  • Never ever ever join to get data, better to issue two simple queries rather than one join. With the caveat that this is born of experience with MySQL and large amounts of data (1/2 TB), even with indices performance can be unpredictable.
  • Sharding your data is pretty much the only way to scale, so design that in from the start.
  • Build a data access layer which hides the schema from the application.

I am sure there is more, but this is a start.

How many cores for innodb?

I came across two articles on Planet MySQL with some benchmarks on Innodb scalability on machines with multiple cores. The first one being “MySQL, Innodb, DBT2 Core Scalability Graphs” and the second one being “Innodb Multi-core Performance“.

What is interesting is that both seem to suggest that Innodb peaks around 8 cores, though I would be curious how many CPUs there actually were and what the memory bandwidth was. I guess what I am really saying is that I am curious where the bottleneck actually is.


Get every new post delivered to your Inbox.