Moving to the Dark Side

Leaving the Pipette for a Keyboard.

How big is my baby?

So I am a bit of a nerd and also a recent progenitor. So of course I had to find ways of analyzing the growth of my F1. What follows is a very brief look into the growth pattern of Em, using the little data I have so far - F1 is only 7 months old.

Read More

Filter overlapping features in bed file

I was doing something that should be easy but I couldn’t find a solution online: remove overlapping coordinates in a single bed file and these must be within a certain distance. In this example:

Read More

Function to find maximal coverage in multiple bigwigs II

[This is an updated version of this post with improved functions and a reproducible example]

Read More

Function to find maximal coverage in multiple bigwigs

I really like the package Gviz to prepare figures for presentations and publications (I have used it in B with some tidying up in inskape).

Read More

Sublime Text 3 set-up

I am a big fan of Sublime Text! It is a lightweight text editor, inexpensive license, and with contributions by hundreds of users, highly extensible and customizable. From a practical perspective, I prefer to use it instead of IDEs, such as Jupyter or RStudio, because I also write a lot of little bash/shell scripts or just one-liners embedded in markdown (my projects notebooks). Also, the pipeline I am using is based on groovy. Sometimes I write code in all 4 languages in a single day, and thus it is easy to see why I prefer a single development environment instead of having to memorize different shortcuts/layouts. Personally it makes my life easier. Also, I love the multi-line editing features of sublime text and the ability to search within projects, etc.

Read More

Custom chromosome sizes for pybedtools

I use pybetools a lot in my little script. One issue that I have encountered recentely was when using those scripts with a custom genome, in this case mapping to the transcriptome. One of my scripts calculates using genome_coverage(bg=True, genome=genome) and the argument genome, is an input from the command line.

Read More

Repeat mapping

Most of the projects I am involved with deal with mapping reads to repeat regions of the genome. Specifically transposons. While not all genomic repeats have exactly the same sequence, it is nonetheless challenging to accurately map as many reads as possible - more reads mapped -> more information (for the same €€).

Read More

Kill all jobs with a name

I had some jobs waiting in queue with non-consecutive job id numbers, but all with the same job name. Mistakes were made and they needed killing. A solution would be to copy-paste all the relevant job IDs and go:

Read More

Testing for over-representation of anything

Recently I wrote a post on how to test for chromosome over-representation on a list of genes. The solution, which I thought it was clever at the time, can be simpled to be applied to test if overlap between two lists of genes is significant. Let’s use the pasilla data again:

Read More

Merge fastq sample from different lanes and rename them

This is something I need to do often and a collegue asked me how to do it herself. So the best way to share is to post it in the blog.

Read More