Predicting & Visualizing Petroleum Production with MindsDB and Tableau
Odumuyiwa Teslim
Posted on October 31, 2022
Introduction
MindsDB is an open-source machine-learning tool that brings automated machine learning to your database. MindsDB offers predictive capabilities in your database. Tableau lets you visualize your data easily and intuitively. In this tutorial, we will be using MindsDB to predict the hourly electricity demand in the United States and visualize results in Tableau. To complete this tutorial, you are required to have a working MindsDB connection, either locally or via cloud.mindsdb.com. You can use this guide to connect to the MindsDB cloud.
Data Setup
Connecting the data as a file
Follow the steps below to upload a file to MindsDB Cloud.
- Log in to your MindsDB Cloud account to open the MindsDB Editor.
- Navigate to
Add data
the section by clicking theAdd data
button located in the top right corner.
- Choose the Files tab.
- Choose the
Import File
option. - Upload a file (
[national_nameplate_capacity.csv](https://www.kaggle.com/datasets/maurosrgiofigueira/fuel-output-capacity-brazil), name a table used to store the file data (here it is
petroleum_production), and click the
Save and Continue` button.
Once you are done uploading, you can query the data directly with;
sql
SELECT * FROM files.petroleum_production LIMIT 10;
The output would be:
Understanding the Dataset
This dataset presents Brazilian petroleum products ouput capacity evolution, from 2011 to 2020 in each Brazilian refinery. Data is displayed in barrels. From the Brazilian National Petroleum Agency.
Context
- Refinery - They are 19 refinery in this dataset.
- Year - The year of this dataset ranges from 2011 to 2020.
- Nameplate Capacity - Nameplate capacity, also known as the rated capacity, nominal capacity , installed capacity, or maximum effect, is the intended full-load sustained output of a facility such as a power station, fuel plant and etc. (Source: Wikipedia)
Creating the Predictor
To being, let’s create a predictor that uses the date to predict the Petroleum Production for refineries. You can learn more about creating a predictor by checking here. You can predict a classification series model using the following syntax
sql
CREATE PREDICTOR mindsdb.[predictor_name]
FROM [integration_name]
(SELECT [sequential_column], [partition_column], [other_column], [target_column]
FROM [table_name])
PREDICT [target_column]
-
CREATE PREDICTOR
: Creates a predictor with the namepredictor_name
in themindsdb
table. -
FROM files
: Points to the table containing the data. -
PREDICT
: Dictates the column to predict.
`sql
CREATE PREDICTOR mindsdb.petroleum_output
FROM files (SELECT refinery, Year, Nameplate_Capacity FROM petroleum_production)
PREDICT Nameplate_Capacity
`
On execution we get:
Query successfully completed
Status of a Predictor
A predictor may take a couple of minutes for the training to complete. You can monitor the status of the predictor by using this SQL command:
sql
SELECT status
FROM mindsdb.predictors
WHERE name='petroleum_output';
Your output should be:
sql
+------------+
| status |
+------------+
| complete |
+------------+
Making Predictions
Now that we have our Prediction Model, we can simply execute some simple SQL query statements to predict the target value based on the feature parameters:
sql
SELECT Year, Nameplate_Capacity
FROM mindsdb.petroleum_output
JOIN files.petroleum_production
LIMIT 100;
Connecting MindsDB to Tableau
Tableau lets you visualize your data easily and intuitively. In this tutorial will use Tableau to create visualizations of our predictions.
How to Connect
- First, create a new workbook in Tableau and open the Connectors tab in the Connect to Data window.
- Click on MySQL
• Input “cloud.mindsdb.com” for Server, “3306” for Port, “mindsdb” for Database “your mindsdb cloud email” for Username, “your mindsdb cloud password” for Password, and Sign in.
Now you are connected and your page should look like this:
Visualizing our Data
Before you can visualize predictions in Tableau, you must first choose a data source. And because the predictions in this article are generated using a SQL statement, you will need to create a custom SQL query in Tableau to generate the data source. To do this:
- First, select the
New Custom SQL
on the left side of the window and use the query below to generate the petroleum production for each year. You can preview the results or directly load the data into Tableau.
sql
SELECT Year, Nameplate_Capacity
FROM mindsdb.petroleum_output
JOIN files.petroleum_production
- Create an extract of the data under the connection heading at the top right of the window. You do this to facilitate data conversion to the appropriate data type. The extraction should take some time.
- Move to the Sheet tab on the bottom left and right-click on the
Nameplate_Capacity
andYear
to convert their data types to Number(whole) and Date, respectively. Additionally, when right-clicking on theNameplate_Capacity
andDate
, choose the option to convert it to a continuous measure.
- Drag the
Nameplate_Capacity
measure to the row shelf and theYear
dimension to the column shelf. - You can also switch the visualization also to line, area and etc.
What’s Next?
Have fun while trying it out yourself!
- Star the MindsDB repository on GitHub.
- Sign up for a free MindsDB account
- Engage with the MindsDB community on Slack or GitHub to ask questions and share your ideas and thoughts.
Give a like or a comment if this tutorial was helpful
Posted on October 31, 2022
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.