I have been toying with the idea of making an R package for sometime. To me this is the natural step after been an
R user for some years now. Though I had some ideas they all sounded either a bit too over-complicated for a starting package, or not useful enough. In ideal world, and for me personally, I would write a package with a single function, that I could use, to learn the ropes.
I was doing something that should be easy but I couldn’t find a solution online: remove overlapping coordinates in a single bed file and these must be within a certain distance. In this example:
[This is an updated version of this post with improved functions and a reproducible example]
I really like the package Gviz to prepare figures for presentations and publications (I have used it in B with some tidying up in inskape).
I am a big fan of Sublime Text! It is a lightweight text editor, inexpensive license, and with contributions by hundreds of users, highly extensible and customizable. From a practical perspective, I prefer to use it instead of IDEs, such as Jupyter or RStudio, because I also write a lot of little bash/shell scripts or just one-liners embedded in markdown (my projects notebooks). Also, the pipeline I am using is based on groovy. Sometimes I write code in all 4 languages in a single day, and thus it is easy to see why I prefer a single development environment instead of having to memorize different shortcuts/layouts. Personally it makes my life easier. Also, I love the multi-line editing features of sublime text and the ability to search within projects, etc.
pybetools a lot in my little script. One issue that I have encountered recentely was when using those scripts with a custom genome, in this case mapping to the transcriptome. One of my scripts calculates using
genome_coverage(bg=True, genome=genome) and the argument
genome, is an input from the command line.
Most of the projects I am involved with deal with mapping reads to repeat regions of the genome. Specifically transposons. While not all genomic repeats have exactly the same sequence, it is nonetheless challenging to accurately map as many reads as possible - more reads mapped -> more information (for the same €€).