Jupyter and Jupytext

Jupyter notebooks are a great way to perform post-processing. They let you interactive explore your data and to quickly iterate by writing small code blocks.

Launch locally

torx comes with a copy of Jupyter installed. If you’re running torx on your local machine (i.e. laptop), you can launch a jupyter notebook by running

source <path to torx>/env/bin/activate
jupyter lab

This should open a web-browser (generally localhost:8888/) where you can create and open notebooks.

Use port forwarding

If you’re running on a remote machine (i.e. the TOK clusters or Marconi), you can use port forwarding. In one terminal window run (where XXXX is some 4-digit number)

source <path to torx>/env/bin/activate
jupyter lab --no-browser --port=XXXX

and then in another terminal run where YYYY is another four-digit number

localuser@localhost: ssh -N -f -L localhost:YYYY:localhost:XXXX remoteuser@remotehost

then open localhost:YYYY/ in a web-browser on your local machine. This doesn’t work for all machines: some don’t allow it.

Notebooks servers on Marconi

If you experience issues getting a notebook to run on Marconi with the method above, the following sequence appears to be the most consistent (if you experience issues with this, please edit this entry!):

  1. In a local terminal, set up a tunnel to Marconi first

    localuser@localhost: ssh -L localhost:YYYY:localhost:XXXX remoteuser@remotehost
    

    As above, XXXX will be the port on Marconi that gets linked to port YYYY on your local machine.

  2. This will provide port forwarding and simultaneously open a terminal on Marconi. In this same terminal, launch the server

    source <path to torx>/env/bin/activate
    jupyter lab --no-browser --port=XXXX
    

Use RVS

If you want to do your analysis on a MPCDF machine, you can use the remote visualisation service. You’ll first need to click ‘Initialise Remote Visualization’ for the machine that you want to run on (only needs to be done once), then SSH into the machine and install torx (see the installation guide) to get a torx kernel on the machine. Next, launch a RVS session using the web-interface, and once you’ve opened a notebook select torx in the list of kernels.

Note that, unlike the other two options, this method uses the MPCDF version of Jupyter. This doesn’t have jupytext installed, so you’ll need to manually convert from .py to .ipynb files (see below). Alternatively, you can run in a terminal on the server

module purge
module load anaconda/3/2021.11
pip3 install jupytext --user --upgrade

This should add jupytext to your RVS environment.

Jupytext: representing Jupyter notebooks as plain-text

Jupyter notebooks have one significant drawback. They’re complicated JSON files with lots of information about when cells were run and what the output looks like. This is useful if you want to open a notebooks in the same state, but it’s makes them a pain to review in Gitlab.

Because of this, before sending notebooks to Gitlab, we convert them into normal Python files using a Python package called jupytext. You can read the docs and figure out how to use it yourself if you want, but for convenience I wrote a little helper function called nbsync (defined in infra/nbsync.py). This applies jupytext on the notebooks defined in notebooks/notebooks_m.py. Note that to use this tool, you must first have the torx virtual environment activated, by running

source <path to torx>/env/bin/activate

To convert a python file, say torx_notebooks/grillix/getting_started.py which has the key grillix_getting_started into a corresponding notebook notebooks/grillix/getting_started.ipynb, type

nbsync to_nb --key=grillix_getting_started

You can then open the .ipynb file using Jupyter to edit and execute the code. To write your changes back into the .py file you can use

nbsync to_py --key=grillix_getting_started

You can then commit the changes to the .py file (the .ipynb files are ignored) and push them back to Gitlab.

If you’re checking out the repository for the first time, you can also execute

nbsync to_nb --key=all

to convert all of the .py files listed in notebooks/notebooks_m.py into their corresponding .ipynb representations (don’t do this on a repository where you’ve made uncommitted changes to the notebooks). There’s also nbsync to_py --key=all to update all .py representations, which can help to make sure that you commit all of your changes.

By default, if you don’t tell nbsync what to do with existing files, it will skip any conversion where the destination file already exists. You can instead choose to --overwrite or --backup the existing destination file. Overwrite removes the existing file, backup renames it by adding a date-time string to the filename.

Pro-tip: if you do delete some of your work on a .ipynb, there’s usually a hidden folder .ipynb-checkpoints in the same folder as your notebook which might have a backup.