I am a big fan of Sublime Text! It is a lightweight text editor, inexpensive license, and with contributions by hundreds of users, highly extensible and customizable. From a practical perspective, I prefer to use it instead of IDEs, such as Jupyter or RStudio, because I also write a lot of little bash/shell scripts or just one-liners embedded in markdown (my projects notebooks). Also, the pipeline I am using is based on groovy. Sometimes I write code in all 4 languages in a single day, and thus it is easy to see why I prefer a single development environment instead of having to memorize different shortcuts/layouts. Personally it makes my life easier. Also, I love the multi-line editing features of sublime text and the ability to search within projects, etc.
pybetools a lot in my little script. One issue that I have encountered recentely was when using those scripts with a custom genome, in this case mapping to the transcriptome. One of my scripts calculates using
genome_coverage(bg=True, genome=genome) and the argument
genome, is an input from the command line.
Most of the projects I am involved with deal with mapping reads to repeat regions of the genome. Specifically transposons. While not all genomic repeats have exactly the same sequence, it is nonetheless challenging to accurately map as many reads as possible - more reads mapped -> more information (for the same €€).
I had some jobs waiting in queue with non-consecutive job id numbers, but all with the same job name. Mistakes were made and they needed killing. A solution would be to copy-paste all the relevant job IDs and go:
Recently I wrote a post on how to test for chromosome over-representation on a list of genes. The solution, which I thought it was clever at the time, can be simpled to be applied to test if overlap between two lists of genes is significant. Let’s use the pasilla data again:
This is something I need to do often and a collegue asked me how to do it herself. So the best way to share is to post it in the blog.
A colleague came to my office the other day with an interesting question:
I was minding my own business trying to add labels to a line plot in
ggplot2. Then I saw that the package directlabels would solve all my problems with one single line of code. I proceed to install it using
install.packages("directlabels", repo="http://r-forge.r-project.org"). Sadly:
Sometimes I am working on some data and notice certain biases, say differentially expressed genes appearing to originate more often from a chromosome. Or a factor binding more often to a class of transcripts. In these situations I tend to turn to Fisher’s exact test. Here I will put an example of what I do.