Improving database performance
I am a bit of a DB n00b but for a project I had to setup and query an rather larger sqlite
database from an existing dump and schema. Here are a couple if things I leaned after asking for help.
/dev/shm/
I had no idea this magical place existed in Unix systems! Basically it’s like /tmp
but in RAM so everything happens much faster. This meant that inserting tables with several GB went down from over a day to about an hour - the final size of the DB was 50GB.
Another trick was to use TRANSACTION
before INSERT
:
BEGIN TRANSACTION;
.separator "\t"
.import protein.tsv protein
.import genome.tsv genome
.import protein_info.tsv protein_info
COMMIT;
Whilst the goal of transactions is to keep data consistent, it also seems to have an effect on the speed of transactions. I am not an expert, quite far from it, all I know is that it definitely helped.
Index tables
If the tricks about were helpful in speeding up db creation, I then had to contend with fairly complex queries in a large db - plenty of merges going on. INDEX
came to the rescue:
CREATE INDEX seqid_index ON protein_info(seqid); /*speed up queries*/
I didn’t index all the tables, just a few key ones, but I guess it wouldn’t harm to have done it for all.
Final words
I would still be stuck in the first step if I had not asked for help from a couple of colleagues that are more experienced in databases. One doesn’t have to know everything, but it is good to know persons that know.