Moving to the Dark Side

Leaving the Pipette for a Keyboard.

Function to find maximal coverage in multiple bigwigs II

[This is an updated version of this post with improved functions and a reproducible example]

Read More

Function to find maximal coverage in multiple bigwigs

I really like the package Gviz to prepare figures for presentations and publications (I have used it in B with some tidying up in inskape).

Read More

Sublime Text 3 set-up

I am a big fan of Sublime Text! It is a lightweight text editor, inexpensive license, and with contributions by hundreds of users, highly extensible and customizable. From a practical perspective, I prefer to use it instead of IDEs, such as Jupyter or RStudio, because I also write a lot of little bash/shell scripts or just one-liners embedded in markdown (my projects notebooks). Also, the pipeline I am using is based on groovy. Sometimes I write code in all 4 languages in a single day, and thus it is easy to see why I prefer a single development environment instead of having to memorize different shortcuts/layouts. Personally it makes my life easier. Also, I love the multi-line editing features of sublime text and the ability to search within projects, etc.

Read More

Custom chromosome sizes for pybedtools

I use pybetools a lot in my little script. One issue that I have encountered recentely was when using those scripts with a custom genome, in this case mapping to the transcriptome. One of my scripts calculates using genome_coverage(bg=True, genome=genome) and the argument genome, is an input from the command line.

Read More

Repeat mapping

Most of the projects I am involved with deal with mapping reads to repeat regions of the genome. Specifically transposons. While not all genomic repeats have exactly the same sequence, it is nonetheless challenging to accurately map as many reads as possible - more reads mapped -> more information (for the same €€).

Read More

Kill all jobs with a name

I had some jobs waiting in queue with non-consecutive job id numbers, but all with the same job name. Mistakes were made and they needed killing. A solution would be to copy-paste all the relevant job IDs and go:

Read More

Testing for over-representation of anything

Recently I wrote a post on how to test for chromosome over-representation on a list of genes. The solution, which I thought it was clever at the time, can be simpled to be applied to test if overlap between two lists of genes is significant. Let’s use the pasilla data again:

Read More

Merge fastq sample from different lanes and rename them

This is something I need to do often and a collegue asked me how to do it herself. So the best way to share is to post it in the blog.

Read More

Finding the closest element to a number in a vector

A colleague came to my office the other day with an interesting question:

Read More

Table of results embed in a PDF

Warning: This is a rant.

Read More