Moving to the Dark Side

Leaving the Pipette for a Keyboard.

Improving database performance

I am a bit of a DB n00b but for a project I had to setup and query an rather larger sqlite database from an existing dump and schema. Here are a couple if things I leaned after asking for help.

/dev/shm/

I had no idea this magical place existed in Unix systems! Basically it’s like /tmp but in RAM so everything happens much faster. This meant that inserting tables with several GB went down from over a day to about an hour - the final size of the DB was 50GB.

Another trick was to use TRANSACTION before INSERT:

BEGIN TRANSACTION;
.separator "\t"
.import protein.tsv protein
.import genome.tsv genome
.import protein_info.tsv protein_info
COMMIT;

Whilst the goal of transactions is to keep data consistent, it also seems to have an effect on the speed of transactions. I am not an expert, quite far from it, all I know is that it definitely helped.

Index tables

If the tricks about were helpful in speeding up db creation, I then had to contend with fairly complex queries in a large db - plenty of merges going on. INDEX came to the rescue:

CREATE INDEX seqid_index ON protein_info(seqid); /*speed up queries*/

I didn’t index all the tables, just a few key ones, but I guess it wouldn’t harm to have done it for all.

Final words

I would still be stuck in the first step if I had not asked for help from a couple of colleagues that are more experienced in databases. One doesn’t have to know everything, but it is good to know persons that know.