Applying Machine Learning to AWS services

Applied Machine Learning in AWS services

When working with Machine Learning, one can quickly be overwhelmed with technicalities and lose track of the original purpose: how can we use past data to answer future business questions?

This begs the question: is this really a game we want to play? Or could Machine Learning become transparent and just do its thing automatically?

As it turns out, a number of AWS services provide built-in Machine Learning features that require exactly zero work to use. Let’s review all of them and see how much of a free ride we can enjoy.

Compute — Predictive Auto Scaling

Sizing your compute infrastructure is probably the single hardest thing when building platforms. Capacity planning is a dark art, especially in fast-moving, unpredictable environments like startups. Of course, cloud computing has brought us elastic, on-demand virtual machines that all but completely solve that problem.

However, there is such a thing as too much infrastructure, and cost needs to be managed as well. For years, Amazon Auto Scaling has helped developers right-size their compute platforms. Still, it’s fair to say that tuning thresholds, alarms and scaling policies became a dark art of its own, as illustrated by this re:Invent breakout session featuring the almighty Netflix.

In the spirit of making things simpler, we introduced Predictive Auto Scaling at re:Invent 2018. Says Jeff Barr: “Using data collected from your actual EC2 usage and further informed by billions of data points drawn from our own observations, we use well-trained Machine Learning models to predict your expected traffic (and EC2 usage) including daily and weekly patterns”.

This new feature will save you a lot of experimenting and guesstimating… as well as mindless acts of random violence against keyboards, desks and coffee machines. The cool thing is that the model is re-evaluated every 24 hours in order to adapt to changing traffic patterns. You’ll learn more in this re:Invent breakout session, featuring Genesys PureCloud.

Storage — Intelligent Tiering

Managing storage is another problem you quickly face when building platforms: more users, more partners, more data to log, etc. The flow never stops. Similar to compute, cloud storage needs to be both elastic and cost-efficient, which is exactly what Amazon S3 has strived for since its launch.

Over the years, additional storage classes have been introduced:

Amazon Glacier (2012), a low cost storage service for data archives,
Infrequent Access (2015) for objects that are, well, infrequently accessed.
One-Zone Infrequent Access (2018), saving an extra 20% at the expnse of reduced redundancy.

In true AWS fashion, developers could now automate the migration of their S3 objects from one class to the next by writing Lifecycle Configurations.

The inevitable soon happened, with the launch of Intelligent Tiering at re:Invent 2018. Says Jeff Barr: “This storage class incorporates two access tiers: frequent access and infrequent access. Both access tiers offer the same low latency as the Standard storage class. For a small monitoring and automation fee, S3 Intelligent-Tiering monitors access patterns and moves objects that have not been accessed for 30 consecutive days to the infrequent access tier. If the data is accessed later, it is automatically moved back to the frequent access tier. The bottom line: You save money even under changing access patterns, with no performance impact, no operational overhead, and no retrieval fees”.

You can learn more about Intelligent Tiering in this re:Invent breakout session, featuring Pinterest.

Storage — Data Protection

Scaling storage is not even half the story: what about making sure that your data stays safe? Of course, S3 provides features like bucket policies, ACLs or encryption to manage and protect your buckets and objects. Here’s a recent deep dive on these topics.

Still, mistakes can happen: incorrect configuration, dropping a sensitive file in the wrong location, making a bucket public, etc. Let’s face it, it’s a question of ‘when’, not ‘if’. When these issues pop up (and boy did they, lately), the only thing that matters is how fast you can detect and fix them: every second counts!

To help organizations avoid these issues, we launched Amazon Macie at re:Invent 2017. Macie notably uses a Support Vector Machine-based classifier to classify objects stored in your S3 buckets.

Says the doc: “This classifier, managed by Macie, was trained against a large corpus of training data of various types and has been optimized to support accurate detection of various content types, including source code, application logs, regulatory documents, and database backups. The classifier can also generalize its detections. For example, if it detected a new kind of source code that doesn’t match any of the types of source code that it is trained to recognize, it can generalize the detection as being source code”.

Macie also uses Machine Learning to analyze AWS CloudTrail logs in order to detect unauthorized access and data leaks. You can learn more about it in this breakout session, featuring Edmunds.com.

Network — Security Monitoring

A similar challenge arises for network security. Doing your best to keep the barbarians at bay is not enough. Sooner or later, a breach will happen, whether of their making or your own, and you’d better be prepared to detect it, log it, fix it and run a detailed forensic analysis to make sure it doesn’t happen again.

Unfortunately, building a solid security monitoring system is pretty complicated, even at reasonable scale: logging, analysing, adapting to new threats, remediating, etc. This is a whole new platform to build and manage…

For this reason, we launched Amazon GuardDuty at re:Invent 2017. Says Jeff Barr: “Informed by a multitude of public and AWS-generated data feeds and powered by machine learning, GuardDuty analyzes billions of events in pursuit of trends, patterns, and anomalies that are recognizable signs that something is amiss. You can enable it with a click and see the first findings within minutes”.

As soon as findings are available, they can be processed automatically for remediation, either by your own code or by 3rd party solutions. You can learn more about it in this re:Invent 2018 breakout, which shows you how to extend GuardDuty with popular solutions like Splunk and others.

Analytics — Cleaning data

Once you’re reasonably happy with your storage infrastructure, you can start ingesting data, cataloguing it, cleaning it and preparing it for analytics. As you certainly know if you’re working with data, this step can take up to 80% of your time… as confirmed by the show of hands every time I ask the question!

Initially, customers have built their own solution by assembling services like Amazon S3, Amazon EMR, AWS Glue and so. To make things simpler (again), we previewed AWS Lake Formation at re:Invent 2018.

One of the cool features of Lake Formation is ML transforms. These transforms help you clean your data automatically. Here are some examples from the FAQ:

Linking patient records between hospitals so that doctors have more background information and are better able to treat patients by using FindMatches on separate databases that both contain common fields such as name, birthday, home address, phone number, etc.

Deduplicating a database of movies containing columns like “title”, “plot synopsis”, “year of release”, “run time”, and “cast”. For instance, the same movie might be variously identified as “Star Wars”, “Star Wars: A New Hope”, and “Star Wars: Episode IV — A New Hope (Special Edition)”.

Automatically group all related products together in your storefront by identifying equivalent items in an apparel product catalog where you want to define “equivalent” to mean that they are the same ignoring differences in size and color. Hence “Levi 501 Blue Jeans, size 34x34” is defined to be the same as “Levi 501 Jeans — black, Size 32x31”.

You can learn more about AWS Lake Formation in this breakout.

Analytics — Streaming data

When working with real-time, streaming data, any latency in extracting insights should be kept to a minimum. As Amazon Kinesis is the preferred way to ingest streaming data, we added Machine Learning capabilities to Amazon Kinesis Data Analytics, an extension of Kinesis that lets you run SQL queries on streaming messages. At the moment, hotspot detection and anomaly detection are available.

You can learn more about the service in this re:Invent breakout, featuring Autodesk.

Reporting — ML insights

As mentioned at the beginning of this article, most Machine Learning is really about one single thing: using past data to answer future business questions. Thus, wouldn’t it be nice if we could simply extract insights from that data as we visualize it, instead of going through the usual cycle of building a model, predicting, etc.?

This is exactly what Amazon QuickSight now offers thanks to ML Insights , a new feature previewed at re:Invent 2018. For now, you can detect anomalies, forecast future trends, and build natural language narratives describing your dashboards.

I think this is an exciting new way to use Machine Learning without any expertise: you can learn more about it in this re:Invent breakout.

Conclusion

As you can see, there’s a growing amount of Machine Learning taking place under the AWS hood. Not only is this making services smarter, it also saves a ton of time and complexity: can you imagine having to build any of these by yourself?

I don’t gaze in crystal balls much, but I’m hoping to see significant progress in this direction for years to come. There’s no reason why Machine Learning can’t be made as pervasive as anything else, and it should be.

A few years back at re:Invent, Dr Werner Vogels introduced a strange new service called AWS Lambda with a slide saying: “No server is easier to manage than no server”. It took a while for a lot of us to understand what he meant, and now we see that serverless vision becoming reality.

So, in the same spirit, let’s hereby declare that:

No Machine Learning is easier than manage than no Machine Learning.

It’s going to be a long road, but we’ll get there. Agree? Disagree? Happy to discuss feature ideas and to answer questions, here or on Twitter.

As always, thanks for reading.

Blog