Exploring the shortcomings of Jupyter Notebook and its alternative
Avi Chawla · Follow
--
Jupyter notebooks have been an indispensable tool for numerous data science workflows for years. These include performing data mining, analysis, processing, modeling, and general day-to-day experimentation tasks performed in the life cycle of every Data Science project.
Despite its popularity, many data scientists have also pointed out its numerous shortcomings, such as here and here.
Deepnote, like Jupyter, is a Data Science Notebook that intends to serve individuals as well as teams with all their data science tasks efficiently while addressing the profound caveats of Jupyter.
If you have been following my blogs lately, you should have noticed that I have been using Deepnote for all my projects, and it has streamlined the whole data ecosystem for me.
Therefore, this post serves as an introduction to Deepnote and what makes it a revolutionary data science notebook in my opinion. The highlight of the article is mentioned below:
Getting Started
A First Look At the Notebook
Jupyter vs Deepnote
Final Thoughts
Let’s begin 🚀!
To use Deepnote, you should create an account (which is free). Once you log in, create a notebook via Create new → New project
, which will open a notebook as shown in the image below:
For convenience, I have divided the notebook into three regions — surrounded by blue, orange, and green boxes.
- You will see your existing projects in the Workspace section on the left (blue region). Lately, I have been experimenting for all my Medium Blogs on Deepnote, and the section lists all the notebooks I have created.
- Data is the most prominent requirement for every Data Science project. Integrations on the left allow you to connect to various data sources and leverage them in your pipeline. The contemporarily supported integrations are:
A few of these integrations that you are most likely to use are:
Shared Dataset: This allows you to upload a dataset manually to their servers.Google Drive: Mount google drive to the notebook and use it seamlessly as you do with Google Colab.Amazon S3: Here, you can mount an Amazon S3 bucket and use it for I/O operations in the notebook.
- The orange region provides the space to write code and experiment. There’s an option to upload an existing code from three sources, namely,
Jupyter Notebook
,GitHub
andGoogle Drive
. I will get back to more of its features in the next section of this blog. - Within the green box, on the top, there’s an option to share the notebook via a URL — eliminating the need to share code via emails, GitHub, etc. Below, you see the python environment, memory utilization, a terminal, and the table of contents. The Python version running in this notebook is
3.9
.
Having discussed an overview of the notebook’s interface, let’s compare Jupyter Notebook and Deepnote head to head on various parameters.
The parameters I will discuss below are pertinent to some of the most general use-cases of a notebook for an individual as well as a data science team. These are:
#1 Real-time Collaboration
#2 Managed Environment
#3 Code Embeds
#4 Data Visualisation
#5 Utility for Teams
Let’s discuss them one by one below.
#1 Real-time Collaboration
How many times have you been on a call sharing your screen with your colleague to find errors in a code (or not just errors but resolving general queries as well)? I have been there many times and I am sure you have been there too.
Jupyter notebooks, although an excellent tool for data scientists working alone, have not been among the best of choices for team collaboration as they are usually hosted locally on one’s desktop.
Deepnote, on the other hand, allows you to collaborate within the same notebook. The process is as simple as sharing a link with your team members, which is demonstrated below:
The steps are as follows: Share & Publish
→ Enable Sharing
→ Select Permission
→ Copy Link
→ Share
. Done!
Similar to Google Docs, as a workspace owner, you can specify various permission levels for each potential collaborator. This includes view
, execute
, comment
, edit
, and full access
.
#2 Managed Environment
Setting up the conda environment on your local computer can sometimes be a challenge of its own. This gets overwhelming, especially when you switch your system and have to repeat the process, introducing unnecessary redundancy.
Deepnote takes the heavy lifting of installing the modules and setting up the environment to run python. As a result, data scientists can take advantage of its fully managed and hosted solution without worrying about managing different versions of python.
In addition to Python, executing SQL queries is also inherently supported by the Deepnote Notebook.
#3 Code Embeds
Among my favorite features of Deepnote is the ability to embed code blocks to my blog here on Medium.
Code blocks are integral to writing a quality blog to illustrate a programming concept. As Jupyter notebooks run locally, there’s essentially no other way but to copy the code first to GitHub to create GitHub gists and embed them in the blog.
Deepnote allows you to experiment and create code embeds in a single place, eliminating the need to create GitHub gists specifically for this purpose.
Moreover, with GitHub gists, other than commenting the output in the gist itself, you have no other way to show the output of the code. Deepnote cells, on the other hand, allow you will three options, that is, embed only code
, embed only output
and embed both code and output
.
A code embed created from Deepnote looks as follows:
#4 Data Visualisation
Exploratory Data Analysis (EDA) is a vital step in building data science applications as they allow you to generate insights from the given data, which in turn, assist in modeling machine learning models.
Although python libraries like Matplotlib, Seaborn, Plotly, etc., are great tools in this regard, their no-code alternatives are gaining more and more popularity among data scientists (read my blog below to know more).
Jupyter notebooks inherently do not provide any way to perform EDA on the given data except by writing code explicitly. Although open source libraries like Lux have taken a step in this direction to automatically generate visualizations for the given data, many (including me) have personally found it producing plots that have no utility or are irrelevant to the task at hand.
Deepnote understands the no-code preference for data scientists and provides a visualization tool within the notebook itself. To understand this, I have provided a visualization walkthrough on the iris dataset:
As demonstrated above, the visualization block allows you to generate insights the same way as you would do with python libraries, but efficiently and without needing any code.
#5 Utility for Teams
All the projects that I have completed so far on Deepnote were conducted individually. However, as per my understanding and experience, Deepnote holds an immense utility for a group of people or a team of data scientists working together, such as:
- Feedback through comments: Each cell in Deepnote allows the collaborator to leave comments, eliminating the need to switch back and forth between messaging apps and code to provide feedback.
- Code Development Tracking: With access to a developer’s code, managers and other team members can easily monitor the code progress and development lifecycle.
- Code Hosting Services Not Required: As Deepnote is responsible for managing and handling your code, teams do not need to push their coded pipelines to tools like GitHub, BitBucket, etc. — thereby reducing operational costs.
- Python Environment Management: An added benefit of being managed and hosted is that organizations do not need to worry about version management and system updates.
- Access Management: Often, while working in teams, different people may be entitled to different accesses depending upon their roles. Deepnote makes it easier for workspace owners to distribute ownership or restrict access among its team members simply with a click of a button. Head over to
Share & Publish
→Collaborators
→Manage Collaborators
→Enter email and Select Access-type
→Add Collaborator
. Done!
Jupyter notebooks have undoubtedly been the go-to tool for numerous data science workflows for years now.
However, as we have started progressing towards leveraging efficient systems for collaboration for teams, Jupyter notebooks have started becoming incompetent and obsolete in the realm of large-scale data science projects.
This has led organizations to pivot towards a practical solution, among which, Deepnote undoubtedly stands as a deal breaker due to the points discussed above.
Although alternatives like Google Colab may work, I personally do not prefer using it because of limited usage constraints and privacy issues. This is primarily a concern for teams working on private user-facing applications, making Deepnote a great option to explore.
Thanks for reading!