March 19, 2013 Leave a comment
Here is what I am currently working on:
Thoughts from the edge of the ‘net
March 19, 2013 Leave a comment
Here is what I am currently working on:
December 1, 2011 Leave a comment
I am working on a project that involves Django and ran into an interesting issue. Django creates a small database to keep track of various bits of data one of which is user session information in a table called django_session as follows:
CREATE TABLE django_session (session_key varchar(40) NOT NULL,session_data longtext NOT NULL,expire_date datetime NOT NULL,PRIMARY KEY (session_key)) ENGINE=InnoDB DEFAULT CHARSET=utf8;
This is all well and good but there is an issue. InnoDB orders the rows in primary key order (primary keys are SHA1 hex digest). The problem is that these keys are effectively random so a new session row be be inserted anywhere in the table causing data to move around with every insert. While this might work when the table is small, it does not work so well when you have 500,000+ rows in it (which is another issue that I will get to).
A better schema for the table is as follows:
CREATE TABLE django_session (id int(11) NOT NULL AUTO_INCREMENT,session_key varchar(40) NOT NULL,session_data longtext NOT NULL,expire_date datetime NOT NULL,PRIMARY KEY (id),UNIQUE KEY session_key (session_key)) ENGINE=InnoDB DEFAULT CHARSET=utf8;
This will ensure that rows are inserted consecutively which will ensure better performance as the table grows.
Two things to note:
I am not sure whether Django specifies the ENGINE to use when creating these tables, but MySQL 5.5 uses InnoDB rather than MyISAM, and I don’t think this will be an issue with the latter.
The other thing is that Django does not seem to clear out sessions past their expiry date, so one needs to do that regularly with the following statement:
DELETE FROM django_session WHERE expire_date <= NOW()
One more thing, I think that is the case too with database backed caches too.
July 27, 2011 Leave a comment
BBEdit 10.0 came out last week, I have been a long time user of BBEdit (pretty much since it came out). This release comes with a lot of changes, a lot, and has required me to change some of my work processes. There are a number of rough edges too so it might be an idea to wait for the .1 release. And you will want to make sure not to upgrade unless you have some downtime to work through all the changes.
June 26, 2011 Leave a comment
To access MySQL from Django you need to install MySQL-python, and this is usually where the trouble begins. MySQL-python will run ‘mysql_config’ to determine what the default MySQL settings are, the one to pay attention to is the ‘–socket’ one. This is the socket that MySQL-python will use to access MySQL if the DATABASE/HOST setting in the Django settings.py file is left empty or set to ‘localhost’. MySQL-python appears to disregard any config setting in /etc/my.cnf, so if you set the ‘socket’ setting in /etc/my.cnf to something else (such as ‘/var/lib/mysql/mysql.sock’) then Django will NOT be able to access the MySQL server.
There are two solutions to this. Either you set the ‘socket’ setting in the /etc/my.cnf file to match the setting reported by ‘mysql_config’. Or you set HOST in the Django settings.py file to the host name of the machine running MySQL.
One thing that I have noticed is that MySQL installations that are made through apt-get or yum have the socket default set to ‘/tmp/mysql.sock’ whereas MySQL installations that are made from the MySQL download have the socket default set to ‘/var/lib/mysql/mysql.sock’.
Other issues you may run into is installing MySQL in a non-standard directory, for example ‘/usr/local/mysql’, while MySQL-python will probable install correctly, it may not be able to pull in the MySQL libraries when running under the Apache server.
What I generally do is let MySQL-python install however it wants, and set the HOST to the hostname of the machine where MySQL is running.
June 24, 2011 Leave a comment
Lots of moving parts to the site, every night we download data from about 20 sources (about 3GB of data in total), parse it all and assemble the database and all the links to external resources. Basically a big ETL machine.
What is interesting to me is the breath of quality in the data and the lack of standardization. Actually the only standard that exists is the comma delimiter. The other interesting thing is that some sites really strive to keep their data up to date while others are much more, shall we say, relaxed about it.
OMIM also now has a Twitter account.
April 26, 2011 Leave a comment
I was very interested to read “The gadgets police use to snarf cell phone data” over on ArsTechnica and wondered how long it would be before there were apps that blocked these gadgets, and/or apps which counter-attacked these gadgets with bad data or viruses.
April 23, 2011 Leave a comment
A project I am currently working on requires the download of about 750MB of compressed data every night from about 10 different sites. This data is used to build links to other resources so it would be a ‘bad thing’ if the data was messed up for some reason. The two patterns I have run into so far are that the data is no longer there (file is missing), or that the data is incorrect for some reason (file is truncated.)
So I put in a couple of checks in the script that handles the download. The first is that the data is downloaded to a temporary area before being moved to its final area. The other is that I check the size of the new file against the size of the current file. If the files differ more than a certain percentage in size, the new file is not used and this is flagged. Obviously the threshold will be domain specific and there may be a direction check as well (i.e. the file should never be smaller.)
This is pretty much all I can do, the files don’t have MD5 signatures, and there are no deltas either.
March 9, 2011 Leave a comment
I just updated three projects I am working on from MySQL 5.1 to 5.5. The updates were very smooth, I did backup my databases first and ran a test update on the smallest of the three installations to weed out any remaining issues in the my.cnf file. I also ran a ‘mysql_upgrade’ check before I updated to make sure there were not lingering issues. Once I was comfortable upgrading was a breeze.
The one thing I noticed is that MySQL seems to be a bit faster now, but maybe that is just an impression.
March 1, 2011 Leave a comment
As part of the work I have done on OMIM, I had to put in an autocomplete features. I found the jQuery Plugin which appears to be no longer supported. jQuery UI has a nice autocomplete implementation but it does a lot more than I needed and requires a buy-in to the UI tools.
Then I found jquery-autocomplete on Google code which did all I needed. I made a number of changes to the code to bring the functionality in line with the autocomplete feature on major search engines. And I am also bringing the documentation up to date.
October 26, 2010 Leave a comment
I was sorry to see that Benoît Mandelbrot passed away recently. I have always had a soft spot for the Mandelbrot set, in fact the first C program I wrote was to draw the the Mandelbrot set. I used Turbo-C for that and the result was rendered on a VGA display. I believe the computer was a 386 with 32K of cache because I worked hard to make sure the program was less than 32K in size so it would fit in the cache.