Version Control of Jupyter Notebooks

Anton Bogomazov | June 29th, 2023

Version control is crucial for managing source code files. It allows developers to track changes, collaborate seamlessly, and revert to previous versions when necessary. The same principles apply to data science projects, but traditional version control practices may not always be suitable. For instance, they often fall short when dealing with the popular Jupyter Notebook format, which represents notebooks as JSON files. However, working with the raw JSON in a text editor is not a good idea due to the content of notebooks not only text but also source code, rich media output, and metadata. As a result, users typically interact with GUI with the rendered content of the notebooks.

The Challenge of Viewing Differences in Jupyter Notebooks

This approach poses challenges when attempting to view the differences between notebook versions. Users often encounter lengthy diff outputs, even for minor changes, as a single line code modification can change the entire cell output of hundreds of lines. Consequently, the diff becomes verbose, uninformative, and hard to read, making the change opaque.

Finding a Solution: Introducing nbdime

In one of our projects, primarily focused on Jupyter Notebooks, we encountered these aforementioned issues and tried to find a solution. Our goal was to see the differences between rendered notebooks, rather than comparing the JSON file content directly.

Version Control of Jupyter Notebooks nbdime

A naive solution would be clearing the outputs before committing changes, but this only partially resolves the problem. Even though we don’t see the changes in the output, because there isn’t one, diff is still poorly readable. Additionally, storing outputs is often required and can be useful to avoid requiring users to re-execute all cells with complex computations when they simply want to review the outputs.

Fortunately, the Jupyter ecosystem offers several tools for working with notebooks, and one of them, nbdime, provides the desired solution. With nbdime, it becomes possible to view the differences between different versions of Jupyter Notebook files in a human-readable format, both in the console and through a graphical interface. Importantly, nbdime does an excellent job of identifying changes not only in code cells but also in their outputs.

Enhancing Code Review: Leveraging nbconvert and VCS Extensions

Using nbdime, we solve the problem of effectively viewing differences between commits locally. However, when it comes to code review on services such as Bitbucket, the default diff output remains uninformative.

Version Control of Jupyter Notebooks nbconvert

In such cases, it is worth considering commercial VCS extensions or leveraging another Jupyter tool called nbconvert. This tool allows exporting notebook content into various formats. The best practice is to generate a plain text file containing the code alongside an HTML render of the notebook, including its outputs. This allows for separate tracking of code changes and also provides users with a convenient way to view the notebook outputs, including those from previous versions. It is also a good idea to automate this process and add a commit hook for generating Python and HTML files from the notebook. This ensures the consistency of all artifacts and enhances the developer experience by streamlining the workflow.

In conclusion, version control is essential for managing Jupyter Notebooks effectively. While traditional version control practices may not seamlessly handle Jupyter’s JSON-based format, tools like nbdime and nbconvert offer valuable solutions. By adopting these practices, teams can maintain a comprehensive record of code modifications, facilitate thorough reviews, and ensure a smoother and more efficient collaboration process.

How to Use Kaggle Datasets for Research: A Step-by-Step Guide
Use Kaggle datasets for research responsibly. Steps: Create an account, explore, check licenses, clean data, conduct research, cite, and share.
Keras vs. TensorFlow: Understanding the Powerhouse Duo of Deep Learning
Keras and TensorFlow complement each other, choose based on needs and expertise.
Deep Dive into eProcurement: New Article Series
Deep dive into eProceurement: New Article Series — Stay tuned for expert advice & best practices that empower professionals in the eProcurement industry.
Version Control of Jupyter Notebooks
Learn how to effectively manage version control for Jupyter Notebooks. Discover tools like nbdime and nbconvert that enable viewing differences, enhancing code review, and streamlining workflows.
Anemic Domain Model Anti-Pattern
Avoid the Anemic Domain Model anti-pattern: learn its drawbacks, violations of OOD principles, and strategies to build cohesive domain models.
The Most Efficient Development Philosophy: Think DevOps
In the world of modern technologies, software development is one of the fastest-evolving areas. Through the last decades, even development approaches changed and brought out new methodologies. The most popular one today is DevOps. What Kind of Beast is DevOps? Sometimes DevOps is considered as just a term for particular tools, in fact being a

Let’s start building something great together!

Contact us today to discuss your project and see how we can help bring your vision to life. To learn about our team and expertise, visit ‘About Us‘ webpage.

SETRONICA

Setronica is a software engineering company that provides a wide range of services, from software products to core business applications. We offer consulting, development, testing, infrastructure support, and cloud management services to enterprises. We apply the knowledge, skills, and Agile methodology of project management to integrate software development and business objectives effectively and efficiently.

Version Control of Jupyter Notebooks

The Challenge of Viewing Differences in Jupyter Notebooks

Finding a Solution: Introducing nbdime

Enhancing Code Review: Leveraging nbconvert and VCS Extensions

Related posts

Let’s start building something great together!

Related

Recent Posts

Categories

SETRONICA

Recent Posts

Contact us

Version Control of Jupyter Notebooks

The Challenge of Viewing Differences in Jupyter Notebooks

Finding a Solution: Introducing nbdime

Enhancing Code Review: Leveraging nbconvert and VCS Extensions

Related posts

Let’s start building something great together!

Share this:

Related

Recent Posts

Categories

SETRONICA

Recent Posts

Contact us