Taz Crawlers From China

Continuing with experiences on the net another thing I have noticed are crawlers  operating from China (at least that is what their originating IP addresses are from). None are well behaved and ignore the robots.txt file, some are well written insofar that they are efficient, but most are not and will download anything and everything usually multiple times. I call those Taz crawlers (think Tasmanian Devil) for what should be obvious reasons. We block all these crawlers.


DDOS & Dumb Choices

Recently one of the sites I manage was subjected to a DDOS attack. It was not DDOS attack per-se, but someone wanted some very specific data from the site and thought it would be a good idea to contract it out to a ‘bot farm. The reason I say that they wanted some data was that the urls were very specific. The net effect was a DDOS because lots of ‘bots from everywhere around the world were hammering the site for this data, over and over again. We were lucky in that the attack started slowly so we were able to check the HTTP request used to see how we could screen for it and turn away requests before they got too far down the stack. The attack lasted about 5 days.

A few things to note about this. The HTTP request was easily recognizable so could be screened out. The data was spread over 160 pages with one page summarizing the data so one single request would have gotten the data. Because we were able to screen out the requests the ‘bots failed to get the data. There is a contact form on the site and they could have just asked.

Lakes and Oceans

Awesome xkcd cartoon today about the relative depths of lakes and oceans.


Baylor-Hopkins Center for Mendelian Genomics

And the other project I have been working on is a website for the Baylor-Hopkins Center for Mendelian Genomics. This website is designed to capture patient feature and DNA sample information for sequencing single-gene mendelian phenotypes. The site is not really for public consumption though.

OMIM – Online Mendelian Inheritance in Man

For the curious, for the past 18 months I have been working on the OMIM website on-and-off. From the website:

OMIM is a comprehensive, authoritative compendium of human genes and genetic phenotypes that is freely available and updated daily. The full-text, referenced overviews in OMIM contain information on all known mendelian disorders and over 12,000 genes. OMIM focuses on the relationship between phenotype and genotype. It is updated daily, and the entries contain copious links to other genetics resources.

Worth a look if you are interested in genetics.


Flashback botnet

This is probably the most ‘stable’ article I have read on the Mac Flashback malware exploit, ‘stable’ in the sense that there is no hysteria or hyperbole.

The one thing I would add is that you should check all browsers as well as Safari:

defaults read /Applications/Firefox.app/Contents/Info LSEnvironment
defaults read /Applications/Google\ Chrome.app/Contents/Info LSEnvironment
defaults read /Applications/Chromium.app/Contents/Info LSEnvironment

In fact I have removed Flash from the ‘/Library/Internet Plug-Ins’ and ‘~/Library/Internet Plug-Ins’ folders, so Safari and Firefox don’t have flash on my machine, Google Chrome and Chromium have their own Sandboxed versions of Flash. Also while I have Java installed, it is disabled on all browsers.