Combining Python, R, Bash in one Jupyter Notebook makes tracking of the workflow easier, simplifies sharing and makes you more efficient and professional.
Why Jupyter Notebooks?
If you read my Big Data tutorial, you are already familiar with Databricks notebooks. These notebooks allow combining code from many different programming languages (Scala, Python etc.) in one notebook. I thought it would be great to set up a similar notebook environment locally on my computer to manage my workflows.
I used to log all my work steps in a text editor. It was a simple and reliable approach, but it was not the most efficient one. In particular, it was laborious to copy all commands I execute in the terminal to the text editor. Moreover, such log files were not user-friendly to share with my colleagues.
I knew that a Jupyter Notebook was the closest to Databriks notebook open-source solution that can be implemented on a desktop. But I knew Jupyter Notebooks only as Python Notebooks. However, after a few minutes searching online, I found out that one can easily combine Python, R, Bash in one Jupyter Notebook and execute code in all three languages within one notebook.
I would like to share with you how to set up such a notebook environment.
R and Bash in Jupyter cells
Below, I provide instruction for Linux OS, but I believe these commands are similar if not the same as in other operating systems.
First, install Jupyter Notebook. You can install it from your package manager by searching for the
jupyter or by using
python -m pip install jupyter
Second, install an R interface for Jupyter from your package manager by searching for
rpy2 package or by using
pip install rpy2
rpy2 in your Jupyter Notebook:
In my case, it complained that I had to install
pandas as a dependency of
rpy2. When I installed it, everything worked correctly.
Note: All these packages should be for the same Python version. So, keep track of whether you install these packages for Python 2 or Python 3.
After these simple steps, I was able to execute Python, R, Bash in one Jupyter Notebook by indicating R and Bash cells with
%%bash, Jupyter magic commands.
To test your installation, you can replicate my commands from the image above.
R and Bash Jupyter kernels
Basically, if you want to use Jupyter Notebooks primarily for Python code with an option to execute Bash and R, the steps described above are enough. However, you can also go further. If you install R and Bash Jupyter kernels, you will be able to use Jupyter Notebooks as notebooks for pipelines in either of these two languages.
To install a Jupyter kernel for Bash, execute:
pip install ipykernel pip install bash_kernel python -m bash_kernel.install
To install an R kernel for Jupyter, you need to run this code inside R:
Now, you should be able to select a particular Jupyter kernel and create a Jupyter Notebook in either Python, Bash or R.
I usually use R and Bash kernels when I work on exclusive R or Bash pipelines.
This way you can use Jupyter Notebooks to log and execute your Python, R, Bash together in one single notebook as well as to create a well-annotated dedicated Python, R, Bash pipelines.