5 Reasons Jupyter Notebooks With Python Are Better Than Excel For Data Analysis

While you can do some good data analysis with a spreadsheet like Excel, if you want to take your calculations to the next level, you might try Python in a Jupyter notebook instead. Here are some reasons why.

Separation of Data and Operations

Showing the tips dataset in LibreOffice Calc.

The advantage of Jupyter notebooks over spreadsheets like Excel or LibreOffice Calc comes from what you’re unable to do. With a Jupyter notebook, you can’t accidentally wipe out or alter your spreadsheet. This is because the Jupyter notebook keeps the data separate from the operations on it.

Using Python, you’ll typically import your data from an Excel or CSV file using pandas’ built-in methods. You can then examine a column or perform a regression without potentially messing up your data. You can even alter the shape of the table by creating pivot tables or “melting” columns together. You can save your results back into an Excel or CSV file.

With a Jupyter notebook, the data is kept separate from the operations because it’s held in memory until you save it back out to another file. Or you might just leave the results in your notebook while leaving the original data untouched.

Keeping the data separate operations comes in hand when you want to share your results with other. Spreadsheets like Excel are meant more for actual calculations and not so much for generating reports.

With Jupyter notebooks, you can show your work by interleaving Python code on your data with commentary or plots of data. Suppose you want to show your coworkers that people who buy widgets also buy whatsits with a regression analysis. In that case, you can show the scatterplot, the regression line, and the code that generates the regression equation, all in the same document.

This is why Jupyter notebooks are becoming common in the scientific community. In the scientific world, research has to be reproducible. Researchers must demonstrate how they obtained their results. A Jupyter notebook, mixing code, images, and commentary, is a perfect vehicle for research. In modern academia, researchers frequently collaborate at different institutions. With a Jupyter notebook and the corresponding data, a collaborator can get the same result.

You might not be an academic researcher, but this approach can benefit you, even if you only collaborate with yourself. How many times have you opened an Excel spreadsheet days or weeks later and struggle to remember what you did and why you did it? Jupyter notebooks can solve that problem. First, the data is kept separate from the code that operates on it, and second, you can explain your rationale for your calculations and any insights that you glean from your data right there, when you see them. Then you won’t have to refresh your memory when you reopen your notebook.

Since it seems everyone is becoming a data analyst, whether by choice or not, Jupyter notebooks can make life easier, whether it’s tracking your workouts or your action figure collection.

Python Scales to Larger Datasets

Descriptive statistics using pandas of a restaurant tips dataset in a Jupyter notebook.

People reach for spreadsheets like Excel because it seems easy to learn with all of the pointing and clicking and dragging. But if you have a large dataset, all the point, clicking, and dragging becomes a drag after a while. You might find your fingers going numb from all of those mouse operations.

Using Python to extract information from your data might seem like a steep learning curve to climb, but you’ll be able to see everything much more clearly when you get to the top.

You’ll have to spend some time learning code, but you had to do the same thing learning Excel functions or functions for some other spreadsheet. Python is easy to learn, which is why it’s already a favorite for introductory programming classes. You can pick up the basics from a book or the online Python tutorial. You’ll be rewarded with another in-demand skill.

The advantage of Python is that you can apply operations to larger datasets with hundreds, even thousands, of data points easily. If you want to take the mean of a column, you can use a built-in pandas method. If you use a Jupyter notebook, you’ll also only have to do it once, since the results will be saved.

Not only do the Python functions, both built-in and through libraries like NumPy and pandas, scale up to larger datasets, they’re also more extensive than what you can find in spreadsheets like Excel or LibreOffice Calc.

You can run multiple regressions easily with libraries like statsmodels, and also model quadratic regressions as opposed to the simple linear regressions you find in most spreadsheet packages.

You can also automate operations by writing Python scripts.

Keeping Data Clean

Removing missing values from the tips dataset with the dropna pandas method.

Working on data sets with pandas also allows you to deal with messy data, as data from the real world often is. You can remove missing data that’s skewing your analysis.

You can create pivot tables from the data that stretch the DataFrame out into a wider view, while you can also melt a DataFrame into one that’s longer. This has advantages when you want to plot data. It’s better to have every column represent a distinct variable. It’s easier to plot this way, and easier to create regressions on your data. If you’ve manipulated your data, you can save it back into another spreadsheet. If you were going to do this, it might be a good idea to create a different spreadsheet file to hold this modified version, just to have your original data intact.

The separation of code and data using Python and Jupyter notebooks also helps reduce data errors. With the data in memory, there’s less risk of accidentally overwriting your spreadsheet.

You Can Import Your Data from Spreadsheets

LibreOffice displaying tips data with Chrome window showing a Jupyter notebook of the same data.

You don’t have to give up your favorite spreadsheet programs when you work with Python using Jupyter notebooks. Since you can import data from Excel and CSVs, you can use them in tandem, playing to the strengths of each platform.

You can use Excel or another spreadsheet program like LibreOffice Calc to enter the data and format it, and import it into Jupyter using pandas to analyze and visualize it. Working with data in this way encourages an approach called exploratory data analysis. With a visualization library like Seaborn, you can quickly create plots that can help you find insights into your data. These can include scatterplots, regression plots, histograms, and boxplots.

These look nicer than the plots you would find in Excel. They’re also geared for gleaning information from data, rather than the plots that are more suitable for business.

If you’re really serious about formatting data, you might try using a simple database engine like SQLite to handle the data, with a graphical front-end like DB Browser. This will let you enforce constraints on data fields, such as it being a number instead of a string.

You might find that both programs complement each other. You might use Excel or LibreOffice Calc to decide how to structure and format your data, then export it into Python using pandas if you want to run a regression that you want your colleagues to be able to verify how you arrived at your conclusions. You can not only show your results, but also explain them using text and plots. Your collaborators can add their own contributions, and you can verify their methods in turn.

Jupyter notebooks have revolutionized the world of data science. You don’t have to be a scientist to use them to take the headaches out of analyzing data.

5 Reasons Jupyter Notebooks With Python Are Better Than Excel for Data Analysis

Separation of Data and Operations

Python Scales to Larger Datasets

Keeping Data Clean

You Can Import Your Data from Spreadsheets

External Links

Resources

Quick Links

5 Reasons Jupyter Notebooks With Python Are Better Than Excel for Data Analysis

Separation of Data and Operations

Reproducibility When Sharing With Others (and Yourself!)

Python Scales to Larger Datasets

Keeping Data Clean

You Can Import Your Data from Spreadsheets

External Links

Resources

Quick Links