The Revolutionary Future of AutoML – Poised to Disrupt the AI Ecosystem Completely

sagar_sidana

Sagar Sidana

Posted on July 28, 2024

The Revolutionary Future of AutoML – Poised to Disrupt the AI Ecosystem Completely

The Revolutionary Future of AutoML – Poised to Disrupt the AI Ecosystem Completely

AutoML concept image

What is AutoML?

AutoML simplifies the work of a machine learning engineer by automating it.

Everything is being automated now, so it does not sound as novel as it was when it first became mainstream a decade ago. Now we have automatic coding for everything. But AutoML is unique.

It allows generalists to train SOTA ML models without knowing much about the models. There's a fascination to it. Perhaps it's the thrill of having AI build AI. Perhaps, it's because it performs at huge scale, without rest, and checks for common errors that a human might accidentally make.

Think of AutoML as an AI assistant to the data scientists of today. Given enough time and compute power, complex Machine Learning problems can be solved by the AI assistant.

The average AutoML operation evaluates tens to hundreds of both models and hyperparameters depending upon the algorithm, the platform, and the computing costs allocated to it. It can cover a huge number of models and hyperparameters for the data scientist or ML engineer, and that makes it an indispensable tool to all data scientists and ML engineers today.

And now, with agentic LLMs, the future of AutoML is looking very interesting. AIs building AIs was science fiction for a very long time. With agents and LLMs, it is happening today for real.

You can expect every cloud provider to create their own LLMs and put it into their cloud AutoML offering and hugely speed up the automation while maintaining data privacy,

Llama 3.1 400B is the first step into that future – which does not look very far away. AutoML is poised to change. It was earlier built on NAS (Neural Architecture Search) and Reinforcement learning. Putting LLMs into AutoML will disrupt the entire industry. That is the future of all cloud computing platforms.

As you can imagine, nearly all the cloud AI tools provide their own version of AutoML today, However, some are more prominent than others. A list of some of the offerings of the top vendors is given below.

Top Five Vendors providing AutoML in the Cloud

Google Cloud AutoML

Google Cloud AutoML

Website: https://cloud.google.com/automl

Overview:

  • Google Cloud AutoML is ideal for beginners in machine learning.
  • It allows users to create custom models for specific datasets.
  • The platform integrates seamlessly with other Google Cloud services.
  • It can handle various data types, including images, text, and tables.
  • Users benefit from a strong community and extensive documentation.

Pros:

  • The platform is very beginner-friendly and easy to use.
  • It automates crucial tasks like data preparation and model selection.

Cons:

  • There are fewer advanced customization options available.
  • It may not fully meet the needs of experienced data scientists.

Microsoft Azure AutoML

Website: https://azure.microsoft.com/en-us/solutions/automated-machine-learning

Overview:

  • Microsoft Azure AutoML offers extensive customization for model training.
  • It integrates easily with other Azure tools and services.
  • The platform closely monitors model performance for insights.
  • It supports a wide range of machine learning algorithms.
  • Comprehensive documentation is available to guide users.

Pros:

  • Users can take advantage of extensive customization capabilities.
  • The tools are well-designed, enhancing the overall user experience.

Cons:

  • The setup process can be complex and challenging.
  • There is a learning curve, even for those with some experience.

Amazon SageMaker Autopilot

Website: https://aws.amazon.com/sagemaker/autopilot/

Overview:

  • Amazon SageMaker Autopilot manages the entire machine learning workflow.
  • It automatically selects the best model for your dataset.
  • The platform explains how predictions are generated for transparency.
  • It integrates well with other AWS services for efficiency.
  • The system scales effectively for large machine learning tasks.

Pros:

  • It generates multiple effective models for user selection.
  • Strong integration with SageMaker Studio enhances functionality.

Cons:

  • Onboarding can be complicated for new users.
  • It requires AWS setup and some coding knowledge.

IBM Watson Studio AutoAI

Website: https://www.ibm.com/products/watson-studio/autoai

Overview:

  • IBM Watson Studio AutoAI simplifies data cleaning and preparation tasks.
  • The platform automatically fine-tunes models to enhance performance.
  • It encourages collaboration between data scientists and business teams.
  • Integration with other IBM tools enhances its overall capabilities.
  • Users can connect to various data sources for flexibility.

Pros:

  • It automates essential tasks like feature engineering and model tuning.

Cons:

  • Limited information is available about specific advantages and disadvantages.

H2O Driverless AI

Website: https://www.h2o.ai/

Overview:

  • H2O Driverless AI offers an open-source version for customization.
  • The platform automates feature generation and selection processes.
  • It provides insights into model predictions for better understanding.
  • It supports advanced deep learning algorithms for complex tasks.
  • The user-friendly interface is accessible to various skill levels.

Pros:

  • The open-source nature fosters a large and supportive community.
  • It automates many aspects of the machine learning workflow.

Cons:

  • More technical expertise may be required compared to some commercial options

How does AutoML work?

AutoML process

AutoML performs the seven standard steps for every ML engineering task.

  1. Data Collection:

    • Collect the necessary datasets for the task.
    • Ensure the data is relevant and appropriate.
    • Identify any additional data sources needed.
    • Check the quality and completeness of the data.
    • Format the data for AutoML input.
  2. Data Cleaning:

    • Automatically address missing values and inconsistencies.
    • Use techniques like imputation and normalization.
    • Eliminate duplicates and irrelevant entries.
    • Ensure data is correctly formatted for modeling.
    • Validate the cleaned data for readiness.
  3. Feature Engineering:

    • Create new features to enhance performance.
    • Apply extraction, transformation, and selection methods.
    • Assess the importance of each feature.
    • Combine features for better insights.
    • Choose the best features based on their impact.
  4. Model Selection:

    • Test various machine learning algorithms.
    • Consider the type of task and dataset size.
    • Compare algorithms using metrics like accuracy.
    • Select the most suitable algorithm(s).
    • Balance complexity with performance.
  5. Training:

    • Simultaneously train multiple models on the dataset.
    • Use cross-validation for better generalization.
    • Monitor training and adjust hyperparameters.
    • Utilize parallel processing to speed up training.
    • Track the performance of each model.
  6. Hyperparameter Tuning:

    • Optimize hyperparameters for better results.
    • Explore combinations with grid or random search.
    • Evaluate each setting for effectiveness.
    • Weigh trade-offs between complexity and performance.
    • Test tuned models with validation data.
  7. Evaluation and Deployment:

    • Measure model performance with key metrics.
    • Compare models and pick the best one for deployment.
    • Provide insights into predictions and feature importance.
    • Prepare deployment-ready artifacts for use.
    • Monitor the deployed model and set alerts for issues.

There is a lot more, of course, that goes behind the scenes, especially in the construction of the models, the pipeline parallelism, hyperparameter optimization, model selection, and much more. To read more about these you could refer to this research paper which is also in the References section[5].

Advantages of AutoML

  • AutoML speeds up how quickly we can develop models and that too, without deep knowledge of said models
  • It handles large datasets really well and efficiently compared to human beings who might struggle for 2-3 weeks of labor (AutoML needs no sleep!).
  • By reducing human bias, it helps create fairer models. However, bias in AI is hard to identify sometimes.
  • This technology makes machine learning easier for everyone. Industries that never used AI could use ML models quickly and easily. You just need basic Python skills.
  • It saves a lot of time compared to traditional methods. A human being would take months to do what AutoML can do in days.

Disadvantages of AutoML

  • It lacks the creativity of human experts and the experience of say, a Kaggle grandmaster in ensembling ML models.
  • Some complex models can be hard to interpret and explain. Now that ML is used in many critical environments, explainability is hugely important.
  • Custom models often need human adjustments. The use-case might require a new type of deep-learning model entirely. AutoML will not be able to see that.

Currently, the common conclusion is that AutoML is not mature enough to operate on its own. That, however, could change very soon. However, right now, the best model is a hybrid approach, using both computers and humans.

Predicting Stock Prices with AutoML using GCP, AWS, & Azure

Stock prediction

Google Cloud Platform: Leveraging AutoML Tables for NVIDIA Stock Prediction

  1. Prepare NVDA historical data in CSV format.
  2. Upload this file to Google Cloud Storage.
  3. Use AutoML Tables API to create a dataset.
  4. Import the data into the created dataset.
  5. Train a model using the imported data.
  6. Specify a training budget to control resource usage.
  7. Wait for the model training to complete.
  8. Use the trained model to make predictions.
  9. Provide current market data as input for predictions.
  10. The model outputs its forecast for the next closing price.

Python code:

from google.cloud import automl_v1beta1 as automl
client = automl.TablesClient(project='your-project-id', region='us-central1')
dataset = client.create_dataset('NVDA_stock_prediction')
client.import_data(dataset, {'gcs_source': {'input_uris': ['gs://your-bucket/nvda_data.csv']}})
model = client.create_model('NVDA_prediction_model', dataset=dataset, train_budget_milli_node_hours=1000)
model.wait()
prediction = client.predict(model, {'open': 280.5, 'high': 285.3, 'low': 279.8, 'volume': 50000000})
print(f"Predicted closing price: {prediction.tables.value}")
Enter fullscreen mode Exit fullscreen mode

Amazon Web Services: Harnessing SageMaker Autopilot for Stock Price Forecasting

  1. Store NVIDIA historical data in an S3 bucket.
  2. Set up an AutoML object using SageMaker Autopilot.
  3. Specify the target variable as the closing price.
  4. Set the problem type to regression for price prediction.
  5. Fit the model to the historical data in S3.
  6. Autopilot will test various algorithms and hyperparameters automatically.
  7. Deploy the best-performing model to a SageMaker endpoint.
  8. Use the deployed endpoint to make predictions.
  9. Provide current market data to get price forecasts.

Python code:

import boto3
from sagemaker.automl.automl import AutoML
sagemaker_client = boto3.client('sagemaker')
auto_ml = AutoML(
    role='your-sagemaker-role-arn',
    target_attribute_name='close',
    output_path='s3://your-bucket/output/',
    problem_type='Regression',
    max_candidates=10
)
auto_ml.fit(inputs='s3://your-bucket/nvda_data.csv', wait=True)
predictor = auto_ml.deploy(initial_instance_count=1, instance_type='ml.m5.large')
result = predictor.predict({'open': 280.5, 'high': 285.3, 'low': 279.8, 'volume': 50000000})
print(f"Predicted closing price: {result}")
Enter fullscreen mode Exit fullscreen mode

Microsoft Azure: Utilizing Automated Machine Learning for NVIDIA Stock Analysis

  1. Upload NVIDIA historical data to Azure Blob Storage.
  2. Create an AutoMLConfig object for the prediction task.
  3. Set the task type to regression for price forecasting.
  4. Specify the primary metric for model evaluation.
  5. Set parameters like cross-validation folds and maximum iterations.
  6. Submit the configuration as an experiment to Azure ML.
  7. The system tries different algorithms and hyperparameters automatically.
  8. Retrieve the best-performing model after the experiment completes.
  9. Use this model to make predictions on new data.
  10. Input current market data to get price forecasts.

Python code:

from azureml.core import Workspace, Dataset, Experiment
from azureml.train.automl import AutoMLConfig

ws = Workspace.from_config()
dataset = Dataset.Tabular.from_delimited_files('https://your-storage-account.blob.core.windows.net/your-container/nvda_data.csv')
automl_config = AutoMLConfig(
    task='regression',
    primary_metric='normalized_root_mean_squared_error',
    training_data=dataset,
    label_column_name='close',
    n_cross_validations=5,
    max_concurrent_iterations=4,
    iterations=10,
    experiment_timeout_minutes=60
)
experiment = Experiment(ws, "NVDA-stock-prediction")
run = experiment.submit(automl_config, show_output=True)
best_run, fitted_model = run.get_output()
prediction = fitted_model.predict({'open': [280.5], 'high': [285.3], 'low': [279.8], 'volume': [50000000]})
print(f"Predicted closing price: {prediction[0]}")
Enter fullscreen mode Exit fullscreen mode

Cautionary Warnings

  • These examples provide a basic approach to stock prediction.
  • Real-world stock prediction is much more complex.
  • Include a wider range of features in your data.
  • Perform thorough validation of your models' performance.
  • Consult with financial experts before making investment decisions.
  • Remember that past performance doesn't guarantee future results.
  • Don't rely solely on these predictions for investments.
  • Always consider other factors affecting stock prices.
  • Regularly update your models with new market data.
  • Monitor model performance and retrain as necessary.

Of course, these examples are not to be used for commercial purposes. They merely show how training hundreds of models and optimally fitting hundreds of hyperparameters and choosing the best among them can be delegated to around ten lines of code.

The significance of the result cannot be belittled.

Now people without deep expertise of AI or machine learning models can use AutoML to create the best possible model available in the market.

And as automation goes further and further, we can see even more steps being taken over by AI. However, we must insist – AutoML even going forward will still need some guidance from a human. Of course, if AGI becomes a reality, everything will change.

And even agentic workflows in Generative AI are fast approaching AGI.

It's a great time to be alive!

The Future of AutoML

  1. AutoML is making AI more accessible to people without deep tech skills. Even basic skills in Python will enable anyone to create a ML model easily.

  2. We might see AutoML become as common as spreadsheets in offices. Of course, in one way, with GitHub Copilot, Claude AI Agent, and Devin Autonomous AI Pair Programmer, this has already happened.

  3. AutoML tools could get smart enough to handle all kinds of data, which is only a matter of time since multimodal LLMs are already extremely well established. As cloud companies employ multi-modals LLMs (MLLMs) into their AutoML offerings, expect vast improvements in the years ahead.

  4. We could get AutoML tailored for specific fields like medicine, the military, family needs, smart homes, stocks, transportation, finance – even AI itself in the form of Generative AI. AutoML could choose the best LLM for the task.

  5. AutoML will use edge computing for smarter gadgets and smarter phones. We are already seeing this with GPT-4o-mini and the Apple iPhone 15 Pro.

  6. Future AI could go so far as to design their own neural networks from scratch. That would be very interesting, using AI to create foundational AI

  7. AutoML systems will be able to explain their decisions, building trust with users. That could also solve a number of research challenges naturally

  8. We could see AutoML creating custom datasets for unique problems. The Llama 3.1 400B model already actually did that for training purposes.

  9. Transfer learning in AutoML could reduce the need for massive datasets and pretrained models will be developed and used. A new type of GitHub will be needed. Something exactly like HuggingFace – which already exists.

  10. AutoML systems will learn to be more energy-efficient and eco-friendly. AI and GPUs are

💖 💪 🙅 🚩
sagar_sidana
Sagar Sidana

Posted on July 28, 2024

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related