Context
The jupyter notebooks contain both code to execute, outputs and metada related with the notebook. Among those metadata, there is by example the number of times (“execution_count”) a given cell has been run.
Below is an extract of a notebook file, with in bold some of the data that will produce noisy commits.
{ “cells”: [ { “cell_type”: “code”, “execution_count”: 1, “metadata”: {}, “outputs”: [ { “data”: { “text/plain”: [ “NULL” ] }, “metadata”: {}, “output_type”: “display_data” } ], “source”: [ “R_INSTALL_DIR=paste(“../build”, “wrp/sdrCore/R”, “”, sep=”/” ) “, “dyn.load(paste(R_INSTALL_DIR, “sidres”, .Platform$dynlib.ext, sep=””)) “, “source(paste(R_INSTALL_DIR, “sidres.R”, sep=”” ) ) “, “names(“sidres”) “ ] },
Keeping the outputs and metadata such as execution_count in a git repository can be annoying since each time someone runs a notebook, those data change, even if the runned code has not changed. That produce git commits containing irrelevant changes, make the changes less readable.
The nbstripout script cleans a notebook from its outputs and metadata, letting git only seen the code parts when checking whether the notebook was modficated.
Installation of nbstripout with conda
The nbstripout can be found at: https://github.com/kynan/nbstripout.
- the base environment of conda contains nbstripout, to make it available in all the projects:
conda conda install -c conda-forge nbstripout
- ${HOME}/.gitattributes contain:
*.ipynb filter=nbstripout
*.ipynb diff=ipynb
- the ${HOME}/.gitconfig contain the following sections:
[filter "nbstripout"]
clean = nbstripout
smudge = cat
required = true
[diff "ipynb"]
textconv = nbstripout -t
This file can be filled either using your favourite editor (should be vim, anyway) or using the following git commands:
git config --global filter.nbstripout.clean nbstripout
git config --global filter.nbstripout.smudge cat
git config --global filter.nbstripout.required true
git config --global diff.ipynb.textconv "nbstripout -t"