Automatically Generating Data Exploration Code in Python With Mito

antozanini

Antonello Zanini

Posted on January 19, 2024

Automatically Generating Data Exploration Code in Python With Mito

Data exploration is a crucial step in any data analysis or data science project. It involves examining the data to gain insights and identify patterns or trends. Even though this process is typically challenging and time-consuming, spreadsheets make data exploration way easier. This is where Mito comes into play!

Mito is a Python library that generates code as you explore data in a spreadsheet, allowing you to improve productivity and save time on data exploration.

Let's learn what Mito is, how to use it, and how it enables you to use AI to explore data through text prompts.

What Is Mito?

Mito is an open-source data exploration tool for Python that provides an easy-to-use UI for exploring, filtering, and managing data in a spreadsheet. It is designed to simplify and streamline the process of data exploration by offering a wide range of features for loading, manipulating, visualizing, and analyzing data in spreadsheets. With Mito, you can explore and edit data just like you would do in Excel. This helps business users gain insights and uncover patterns in their data quickly and efficiently.

What is unique about Mito is that it gives you the Python code equivalent to the data exploration operations you performed visually. This improves data scientists' productivity and allows users to build data exploration scripts without having to know Python.

From a technical point of view, Mito is a spreadsheet embedded in a Jupyter Notebook that can generate pandas code.

Let's see how to use it!

How to Set Up Mito in Python

Learn what you need to do to set up Mito.

Prerequisites

To get started with Mito, you need to meet the following list of prerequisites:

Then, you can install Mito as follows:

  • Open the terminal and download the Mito installer with:
python -m pip install mitoinstaller
Enter fullscreen mode Exit fullscreen mode
  • Run the installer with:
python -m mitoinstaller install
Enter fullscreen mode Exit fullscreen mode

That command will install Mito for classic Jupyter Notebooks and JupyterLab 3.0. Note that the installation process may take a while to complete.

Great, you are now ready to start dealing with Mito!

Creating a Mitosheet

Launch your Jupiter project and create a new Notebook. Then, paste the following two lines of Python code:import mitosheet

mitosheet.sheet()

Click the "Run" button and the following window should appear in your Notebook:

The Mito initialization window

Follow the sign-up wizard to enable the Mitosheet, the spreadsheet with code generation capabilities offered by Mito.

Importing some data

Click on the "Import Files" button and select the data source you want to import into Mito:

Importing data in Mitosheet

Mito supports several data sources. These include:

  • CSV files, both locally and remotely

  • Excel files, both locally and remotely

  • Dataframes

If your source data gets imported successfully, you should see something similar to:

The data source represented in a Mitosheet

Note the advanced spreadsheet capabilities offered by Mito.

Explore Data Through Text Prompts With Mito AI

Mito has recently launched a new feature called Mito AI. This is a powerful tool that enables users to edit data in a spreadsheet with plain text prompts. At the time of writing, that feature is currently in open beta.

Click the "AI" button and accept OpenAI's privacy policy. You should now get access to the AI Transformation section:

The AI Transformation section

In the "Prompt" text area, type the operation you want to perform on your data. For example: "filter out rows with a Price lower than 200000."

Then, click the "Generate Code" button. Mito AI will generate the Python code that attempts to make the desired edit on the data. Inspect the code generated by Mito and if it looks good click "Execute Code." After the code executes, scroll down to the "Results" section to see the effect of the generated code on your data.

Well done! With Mito, exploring data in Python has never been easier, but there is still a lot to learn!

Generating Code for Data Exploration With Mito

All that remains now is to visually explore the source data in the Mitosheet. Edit, add, remove, sort, and filter out some data with some point-and-click operations.

After ending your operation, Mito will add a new Notebook cell containing some code. That automatically generated snippet corresponds to the Python logic required to get the same results on data achieved visually in the Mitosheet.

In the example below, we use Mito to create a pivot table directly in the spreadsheet:

Generating a pivot table in Mitosheet

This is what the Notebook cell generated by Mito at the end of the data exploration operation looks like:

The Notebook cell with Python code generated by Mito

In detail, this is the code produced by the tool:

import pandas as pd

# Imported melb_data.csv
melb_data = pd.read_csv(r'melb_data.csv')

# Deleted columns Unnamed: 0
melb_data.drop(['Unnamed: 0'], axis=1, inplace=True)

# Pivoted melb_data into melb_data_pivot
melb_data_pivot = pd.DataFrame(data={})

# Pivoted melb_data into melb_data_pivot
tmp_df = melb_data[melb_data['Price'] >= 200000]
tmp_df = tmp_df[['Price', 'Rooms']].copy()
pivot_table = tmp_df.pivot_table(
    index=['Price'],
    columns=['Rooms'],
    values=['Price'],
    aggfunc={'Price': ['count']}
)
pivot_table = pivot_table.set_axis([flatten_column_header(col) for col in pivot_table.keys()], axis=1)
melb_data_pivot = pivot_table.reset_index()
Enter fullscreen mode Exit fullscreen mode

As you can see, it contains everything you need to create a pivot table in Python with pandas, comments included.

Note that this is just a simple example, but Mito supports many other advanced data exploration and visualization features. These include graphing, spreadsheet formulas, data frames combination, and more.

The Mito data charting capabilities in action

Explore the official doc to find out what Mito has to offer!

Conclusion

In this article, you learned what Mito is and how it can help you produce data analysis scripts in Python. By loading data in a spreadsheet in Jupyter Notebook, it allows you to visually explore data in a spreadsheet while automatically generating Python code. This helps you save time and energy, allowing even non-technical users to define data exploration scripts in Python.

Thanks for reading! I hope you found this article helpful.


The post "Automatically Generating Data Exploration Code in Python With Mito" appeared first on Writech.

💖 💪 🙅 🚩
antozanini
Antonello Zanini

Posted on January 19, 2024

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related