In machine learning, and deep learning, is known that there is no "best" algorithm, let alone a set of standard hyperparameters, it all depends on the data. For this when theren´t clarity about the algorithms to employ, or put another way, when we want find the technique that reaches the best perfomance for this dataset, is recommendable generate a baseline.

In this post, I will review the new feature of the AWS forecasting service, Amazon Forecast, explain the two methods that allow a first approach to time series forecasting using machine learning algorithms, and finally develop a comparison of these methods through of an example.

A brief review Amazon Forecast

Amazon Forecast is the service for forecasting AWS, enabling create forecasts in simple steps, for millions of records, employment machine learning algorithms, being this techniques used on the Amazon.com for more than 20 years.

The main aim of Amazon Forecast is to facilitate the generation of forecasts without being an expert developer in machine learning, offering classic models until models complex of deep learning how deepAR and CNN-QR, guiding the process through a comfortable interface. Currently is enable the incorporation of weather information and holidays data, depending of the country to forecasting, in addition to Amazon Forecast has developed a new feature for explaining the impact of covariables, in other words, how affect time series related to the predictor. The next image represent the core of service:

However when we want to init some tests with Amazon Forecast, and we don´t have clarity about the algorithm employ, the service offered two alternatives:

AutoML: The search of best algorithm

A classic strategy to start any machine learning project, in the context of forecasting, is to create a baseline of algorithm results and select the technique that offers the best performance. AutoML responds to this need by making available a series of algorithms of different nature, training each of these techniques, returning by default the model with the best metrics. In addition, in the Amazon Forecast predictor, each trained model is enabled for review performance.

AutoPredictor: The search of best combination

There are phenomena related to time series with a segmented behavior in different periods, such as seasons of the year, holidays, or unexpected events that generate the development of the observations that are difficult to predict and much less than a single algorithm can deal with the complete time series. A suggested methodology, but at the same time difficult to implement, is the assembly of models for a time series, that is, looking for a combination of mathematical models that adapt in the best possible way and respond to the objective modeling scenario. Amazon Forecast has developed functionality that is responsible for generating this assembly of models, without the need for programming or any type of advanced development, this feature is an AutoPredictor, and it is currently the default predictor of the Forecasting service. Next, we will validate the performance theory of this type of solution through an example comparing the solution with the functionality of Auto ML.

Comparison performance

The database used for the comparison of the new feature of Amazon Forecast, with AutoML, is in the retail context, where the main objective is to be accurate with the target prediction, to improve planning in a customer's internal process. The dataset contains 340 observations, with a daily frequency, fixing a forecast horizon to the training of 14 days and predicting quantiles 0.1, 0.5, and 0.9. The data schema requested by Amazon Forecast is shown below:

To create the predictor with AutoML, we working from sagemaker, throught of ForecastService of boto3, that allows deploying a process end-to-end. A next to show the code in sagemaker for generating predictor in Amazon Forecast:

create_predictor_response = forecast.create_predictor(PredictorName='smu_iteracion_1_predictor0',
                              ForecastHorizon=FORECAST_LENGTH,
                              PerformAutoML=True,
                              PerformHPO=False,
                              EvaluationParameters= {"NumberOfBacktestWindows": 4, 
                                                                         "BackTestWindowOffset": 21},
                              InputDataConfig= {"DatasetGroupArn": dataset_group_arn,      
                                                "SupplementaryFeatures": [          
                                                                        {             
                                                                            "Name": "holiday",            
                                                                            "Value": "CL"

                                                                                    },      
                                                                            ]   
                                              }, 

                              FeaturizationConfig= {"ForecastFrequency": DATASET_FREQUENCY,
                                                   "Featurizations": 
                                                                        [
                                                                          {"AttributeName": "target_value", 
                                                                           "FeaturizationPipeline": 
                                                                            [
                                                                              {"FeaturizationMethodName": "filling", 
                                                                               "FeaturizationMethodParameters": 
                                                                                {"aggregation": "sum",
                                                                                 "middlefill": "zero",
                                                                                 "backfill": "zero"}
                                                                              }
                                                                            ]
                                                                          },
                                                                        ]
                                                   } )

On another hand, the Amazon Forecast console by default work with the AutoPredictor model, hence from the console we can generate the predictor, wherein each case incorporated the holidays function (the country selected was Chile). In the next image, we can see both predictors created, it is even immediately verified that the global metrics of the models indicate that the AutoPredictor has a better performance

while the metrics associated with the behavior of the model in certain positions of the estimated probability distribution indicate that the precision on the analyzed quantiles (0.1, 0.5, and 0.9) in the predictor of the AutoPredictor method, are better than those achieved by the AutoML. Remember that these metrics should be analyzed with greater care in case of overforecasting or underforecasting that generates a high cost in the business.

For the evaluation of dataset outside of training sample, it is considering 14 days, where next figure to shown the behavior both models, being the AutoML a curve more erratic than AutoPredictor model, even AutoML models for 25 of March no capture the range real value inside of quantiles. Remember that Amazon Forecast performs probabilistic predictions, for example for the value predicted in P10, is understood how the probability of that target of time series is less than or equal to the value predicted is of 10%.

Discussion and conclusion

This post reviewed the Amazon Forecast tool for creating a multi-algorithm baseline or test, the advantage of using a set of models for the time series, as well as showing how Amazon Forecast made this feature simple for users. The next step is to evaluate related time series and the new variable explanation function. Also investigate how to obtain better information regarding the ensemble of models, to understand that the models were relevant to explaining the objective time series.

Blog

Amazon Forecast: Models performance employing AutoML and AutoPredictor (New feature)

nicolaspinocea

A brief review Amazon Forecast

AutoML: The search of best algorithm

AutoPredictor: The search of best combination

Comparison performance

Discussion and conclusion

References

Join Our Newsletter. No Spam, Only the good stuff.

Related