Interesting Bug in Django

I am working on a project that involves Django and ran into an interesting issue. Django creates a small database to keep track of various bits of data one of which is user session information in a table called django_session as follows:

CREATE TABLE django_session (
  session_key varchar(40) NOT NULL,
  session_data longtext NOT NULL,
  expire_date datetime NOT NULL,
  PRIMARY KEY (session_key)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

This is all well and good but there is an issue. InnoDB orders the rows in primary key order (primary keys are SHA1 hex digest). The problem is that these keys are effectively random so a new session row be be inserted anywhere in the table causing data to move around with every insert. While this might work when the table is small, it does not work so well when you have 500,000+ rows in it (which is another issue that I will get to).

A better schema for the table is as follows:

CREATE TABLE django_session (
  id int(11) NOT NULL AUTO_INCREMENT,
  session_key varchar(40) NOT NULL,
  session_data longtext NOT NULL,
  expire_date datetime NOT NULL,
  PRIMARY KEY (id),  
  UNIQUE KEY session_key (session_key)

) ENGINE=InnoDB DEFAULT CHARSET=utf8;

This will ensure that rows are inserted consecutively which will ensure better performance as the table grows.

Two things to note:

I am not sure whether Django specifies the ENGINE to use when creating these tables, but MySQL 5.5 uses InnoDB rather than MyISAM, and I don’t think this will be an issue with the latter.

The other thing is that Django does not seem to clear out sessions past their expiry date, so one needs to do that regularly with the following statement:

DELETE FROM django_session WHERE expire_date <= NOW()

One more thing, I think that is the case too with database backed caches too.

 

Follow

Get every new post delivered to your Inbox.