Skip to main content
R, Bash in Jupyter Notebooks

Python, R, Bash in one Jupyter Notebook

Combining Python, R, Bash in one Jupyter Notebook makes tracking of the workflow easier, simplifies sharing and makes you more efficient and professional. 

Why Jupyter Notebooks?

If you read my Big Data tutorial, you are already familiar with Databricks notebooks. These notebooks allow combining code from many different programming languages (Scala, Python etc.) in one notebook. I thought it would be great to set up a similar notebook environment locally on my computer to manage my workflows.

Read More

Genomic variant calling pipeline

Genomic variant calling pipeline


I would like to share with you my automatic genomic variant calling pipeline. Using such genomic variant calling pipeline becomes essential when a project scales to dozens and hundreds of genomes.


As probably any beginner, I used to process my genomic data with manual interference at every step. So, I would submit mapping jobs for all samples on a computing cluster, when they all done I would submit mark duplicates jobs etc. Moreover, I would also manually write sbatch scripts (my cluster UPPMAX uses the Slurm Workload Manager). It was not efficient.

Well, I used replacements (with sed) and loops (with for i in x; do ...) to reduce the amount of work, but there were many manual steps. I managed to process 24-31 small Capsella genomes (~200Mb) this way during my PhD projects. Now, I work with the dog genome which is much bigger (~2.5Gb) and I also need to analyze many more samples (82 genomes at the moment). So, I had to write this genomic variant calling pipeline to make my workflow as automatic as possible.

Read More