How to Forecast Air Temperatures with AI + IoT Sensor Data
MindsDB Team
Posted on March 24, 2024
We all grasp the concept of forecasting. But when forecasting involves machine learning, it means applying serious compute power to vast quantities of data, then using AI to predict future values.
Time-series AI models specialize in making forecasts. However, to effectively utilize these models, the input data must undergo preprocessing. Specifically, quantitative values must be recorded at uniform time intervals, such as daily or monthly. This preparation ensures that the time-series models can accurately analyze and predict future trends.
What if your data is irregular?
If your data lacks uniform time intervals between consecutive entries, QuestDB offers a solution by allowing you to sample your data. After that, MindsDB facilitates creating, training, and deploying your time-series models.
Experiment with various data sampling intervals and methods to find the optimal ones and achieve the best possible model accuracies. Through this iterative process, you can refine your forecasts and enhance the performance of your time-series model.
In the following tutorial, we'll delve into forecasting air temperatures using data collected from IoT sensors.
The time-series model will be trained on (historical) data collected from IoT sensors, including humidity, rain intensity, wind direction and speed, barometric pressure, solar radiation, and air temperature. The model will learn patterns and correlations between recorded values to forecast air temperatures for the upcoming days.
Prerequisites
Before proceeding with this tutorial, be sure you have the following:
MindsDB
Install MindsDB locally.
QuestDB
Install QuestDB locally or sign up for the QuestDB Cloud account, which provides a free tier of $200.
Alternatively, connect to the demo QuestDB database provided in this tutorial.
Data
If you use a local QuestDB installation, import the chicago_weather_stations dataset.
Alternatively, if you haven’t set up QuestDB, go ahead and use the demo QuestDB database provided in this tutorial that already contains this dataset.
Getting started
In this tutorial, we use the chicago_weather_stations dataset. It contains data collected by the sensors at different timestamps. Note that this data lacks uniform time intervals between consecutive entries, as shown below.
SELECT *
FROM questdb.chicago_weather_stations
WHERE StationName = '63rd Street Weather Station'
ORDER BY MeasurementTimestamp;
Here is the breakdown of columns and their roles:
Target
The target column is the column whose values will be forecasted. Here, it is the AirTemperature column.
Features
The feature columns are all columns used to train the model. Here, these are WetBulbTemperature, Humidity, RainIntensity, IntervalRain, TotalRain, PrecipitationType, WindDirection, WindSpeed, MaximumWindSpeed, BarometricPressure, and SolarRadiation. Note that the historical values of the target column are used to train the model as well.
ORDER BY
The time-series models require a column to be used for ordering data, such as a date or time column. Here, it is the MeasurementTimestamp column that stores the timestamps of all measurements.
Note that the MeasurementTimestamp column should store timestamps with equal time intervals. For example, the measurements should be done daily or monthly. This will be ensured by using the data sampling feature provided by QuestDB.
GROUP BY
Optionally, you can provide a column to be used for grouping data. Here, we want to get separate forecasts for different weather stations, thus, the StationName is used to group data.
The parameters that may influence the accuracy of the model include the following:
Sampling frequency
The more granular sampling frequency causes more data to be inserted via linear extrapolation or simply nulls, which is not good quality training data. For example, daily sampling of sensors’ data may result in less accurate models than that of monthly sampling.
WINDOW
This parameter defines the training window for the model. For example, if the data is sampled daily and we define WINDOW 7, then the training window is equivalent to one week.
HORIZON
This parameter defines the number of forecasted values by the model. For example, if the data is sampled monthly and we define HORIZON 3, then the forecasts are made for the next quarter.
With this knowledge of how data from IoT sensors is used to train the model, let’s move on to the tutorial.
Step-by-Step Tutorial
This tutorial presents how to preprocess data from IoT sensors with QuestDB and use it to create, train, and deploy time-series models with MindsDB.
Note that all statements presented in this tutorial are to be executed in the MindsDB editor.
Step 1. Connect QuestDB to MindsDB
MindsDB integrates with numerous data sources and AI frameworks, allowing users to connect their data to AI models easily.
Here is how to connect the demo QuestDB database to MindsDB:
CREATE DATABASE questdb
WITH ENGINE = "questdb",
PARAMETERS = {
"user": "demo_user",
"password": "demo_password",
"host": "3.220.66.106",
"port": "8812",
"database": "qdb"
};
If you installed QuestDB locally, use these parameters:
CREATE DATABASE questdb
WITH ENGINE = "questdb",
PARAMETERS = {
"user": "admin",
"password": "quest",
"host": "localhost",
"port": "8812",
"database": "qdb"
};
Now, query for the input data used in this tutorial:
SELECT *
FROM questdb.chicago_weather_stations;
Step 2. Sample data with QuestDB
QuestDB provides custom syntax that allows users to sample their time-series data with the SAMPLE BY clause.
You can run QuestDB-native queries directly from MindsDB using this syntax:
SELECT * FROM questdb (
<QuestDB-native query goes here>
);
Below we create views that store sampled data. Therefore, the QuestDB-native query is passed to the CREATE VIEW statement.
Let’s create a view that stores data sampled monthly.
CREATE VIEW chicago_weather_stations_sampled_monthly (
SELECT * FROM questdb (
SELECT MeasurementTimestamp,
StationName,
avg(coalesce(AirTemperature, 0)) AS AirTemperature,
avg(coalesce(WetBulbTemperature, 0)) AS WetBulbTemperature,
avg(Humidity) AS Humidity,
avg(coalesce(RainIntensity, 0)) AS RainIntensity,
avg(IntervalRain) AS IntervalRain,
avg(coalesce(TotalRain, 0)) AS TotalRain,
avg(coalesce(PrecipitationType, 0)) AS PrecipitationType,
avg(WindDirection) AS WindDirection,
avg(WindSpeed) AS WindSpeed,
avg(MaximumWindSpeed) AS MaximumWindSpeed,
avg(coalesce(BarometricPressure, 0)) AS BarometricPressure,
avg(SolarRadiation) AS SolarRadiation,
avg(coalesce(Heading, 0)) AS Heading,
avg(BatteryLife) AS BatteryLife
FROM chicago_weather_stations
SAMPLE BY 1M FILL(LINEAR) ALIGN TO CALENDAR
ORDER BY StationName, MeasurementTimestamp;
)
);
Now let’s create a view that stores data sampled daily.
CREATE VIEW chicago_weather_stations_sampled_daily (
SELECT * FROM questdb (
SELECT MeasurementTimestamp,
StationName,
avg(coalesce(AirTemperature, 0)) AS AirTemperature,
avg(coalesce(WetBulbTemperature, 0)) AS WetBulbTemperature,
avg(Humidity) AS Humidity,
avg(coalesce(RainIntensity, 0)) AS RainIntensity,
avg(IntervalRain) AS IntervalRain,
avg(coalesce(TotalRain, 0)) AS TotalRain,
avg(coalesce(PrecipitationType, 0)) AS PrecipitationType,
avg(WindDirection) AS WindDirection,
avg(WindSpeed) AS WindSpeed,
avg(MaximumWindSpeed) AS MaximumWindSpeed,
avg(coalesce(BarometricPressure, 0)) AS BarometricPressure,
avg(SolarRadiation) AS SolarRadiation,
avg(coalesce(Heading, 0)) AS Heading,
avg(BatteryLife) AS BatteryLife
FROM chicago_weather_stations
SAMPLE BY 1d FILL(LINEAR) ALIGN TO CALENDAR
ORDER BY StationName, MeasurementTimestamp;
)
);
The sampled data can be used to train time-series models.
Step 3. Train and deploy time-series models with MindsDB
MindsDB provides the CREATE MODEL statement used to create, train, and deploy AI models. In this tutorial, we use it to train and deploy time-series models.
Let’s start with the model that uses data sampled monthly.
CREATE MODEL airtemp_monthly_forecast_model
FROM mindsdb
(SELECT * FROM chicago_weather_stations_sampled_monthly)
PREDICT AirTemperature
ORDER BY MeasurementTimestamp
GROUP BY StationName
HORIZON 48 -- using sets of 4 years (48 months) to train the model
WINDOW 12; -- forecasting for the next 1 year (12 months)
The CREATE MODEL statement returns information about the created model, which is in the generating state at first. It may take some time for the model to complete training.
Use this statement to monitor the status of the model.
DESCRIBE airtemp_monthly_forecast_model;
Once the status reads complete, the accuracy of the model is provided. Here, it is 0.908.
Now let’s check how the accuracy differs if data is sampled daily.
CREATE MODEL airtemp_daily_forecast_model
FROM mindsdb
(SELECT * FROM chicago_weather_stations_sampled_daily)
PREDICT AirTemperature
ORDER BY MeasurementTimestamp
GROUP BY StationName
HORIZON 730 -- using sets of 2 years (730 days) to train the model
WINDOW 30; -- forecasting for the next 1 month (30 days)
The CREATE MODEL statement returns information about the created model, which is in the generating state at first. It may take some time for the model to complete training.
Use this statement to monitor the status of the model.
DESCRIBE airtemp_daily_forecast_model;
Once the status reads complete, the accuracy of the model is provided. Here, it is 0.780.
Step 4. Make forecasts
Now that we have both the input data and the AI models ready to use, let’s make some forecasts.
Join the AI model (airtemp_monthly_forecast_model) with the data table (chicago_weather_stations_sampled_monthly) and query for forecasts, as below.
SELECT m.MeasurementTimestamp, m.AirTemperature
FROM chicago_weather_stations_sampled_monthly AS d
JOIN airtemp_monthly_forecast_model AS m
WHERE d.MeasurementTimestamp > LATEST
AND d.StationName = 'Oak Street Weather Station';
Note that we select the timestamp column (MeasurementTimestamp) and the target column (AirTemperature) from the model table (airtemp_monthly_forecasts_model).
In the WHERE clause, we define a condition that ensures the forecasts are made for future timestamps that are not included in the training data (MeasurementTimestamp > LATEST).
And, as we grouped the data, we must choose for which group the forecasts are made (StationName = 'Oak Street Weather Station'). Note that these conditions are made on the columns from the data table (chicago_weather_stations_sampled_monthly).
Follow this process analogically when making forecasts for data sampled daily.
SELECT m.MeasurementTimestamp, m.AirTemperature
FROM chicago_weather_stations_sampled_daily AS d
JOIN airtemp_daily_forecast_model AS m
WHERE d.MeasurementTimestamp > LATEST
AND d.StationName = 'Oak Street Weather Station';
Try it out
If you want to influence the accuracy of the model, try working with these parameters:
Sampling frequency
--> Depending on the initial input data, lower sampling frequencies may increase the accuracy of the model. Like in this tutorial, data sampled monthly resulted in a better accuracy model than data sampled daily.
WINDOW and HORIZON parameters
--> Define the WINDOW and HORIZON parameter values logically based on the sampling frequency of data.
You can define the sampling frequency using the SAMPLE BY clause provided by QuestDB, as shown in Step 2. And, you can alter the WINDOW and HORIZON clauses in the CREATE MODEL statement provided by MindsDB, as shown in Step 3.
This is how you can make a better weather prediction, using MindsDB and QuestDB to improve the quality of time-series forecasting.
Posted on March 24, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.