Lessons learned: Azure Reservations

oskarm93

Oskar Mamrzynski

Posted on November 5, 2024

Lessons learned: Azure Reservations

Here's what I learned while using Azure Reservations:

Reserve NOW

If you have doubts whether you should reserve a Virtual Machine, just ask a question: "Will this VM still be there in 6 months?" If yes, reserve it for at least 1 year.

I can't tell you how much money we've wasted using Pay-As-You-Go rates because of indecisive managers and broken promises. I've been in so many situations where after a year of deliberations we still did not reserve anything because the powers that be couldn't decide if the VMs would still be there. This next "migration project" is not a golden bullet to your infra spend.

Many costs are static: Domain Controllers, Build Agents, Virtual Desktops, Gateway servers, IIS servers etc. If they are likely to be still there in 6 months, chances are - it will be longer and you can save at least 30% by reserving for a year.

Reserve for longer

Reservations for Virtual Machines in Azure can be set to 1 year or 3 years. You are able to exchange any set of reservations for another set as long as the new total cost is greater than the remaining value of existing ones.

To give you an example:

  • I reserve 50 nodes of D8ads_v5 for 3 years.
  • In 6 months time some teams want to use another VM size.
  • Use Exchange button in Reservations tab to cancel existing reservation. Add new reservation for 40 nodes of D8ads_v5 and 10 nodes of E8ads_v5 (memory optimised).
  • The value of 50-node reservation is now (3 years minus 6 months). The new reservation can even be slightly lower in value due to different VM sizes and you can still exchange without any penalty.

reservation-exchange

Point is: You do not have to fully commit to either 1 or 3 year reservations. Microsoft cares more about keeping you on the rollercoaster than about data centre capacity planning.

Better still: You can cancel (without exchanging) up to $50K worth of reservations in any rolling 12 month period without penalties.

Some reservations are greater than others

Virtual Machines have the greatest return on reservations with savings of anything between 30% to 65%.

Azure Redis Cache can slash costs about the same because it's primarily VM-based.

Azure Managed Disks can save about 10% of PAYG price, but only from larger disks >= 1024 GB.

Azure Databricks requires a lot of usage before you can reserve. I did not manage to hit the break-even point yet.

ALWAYS try to consolidate your Log Analytics workspaces and check consolidated usage. Ingestion costs >$3 per GB at PAYG prices. You can slash that by at least 10% by committing to daily ingestion. Log Analytics is not part of normal Reservation, but rather SKU tier on the resource itself.

log-analytics-commitment-tiers

Observe your utilisation

You want to hit that 100% utilisation on your reservations. For a lot of static infra this is a given.

Don't beat yourself up. Even if you miscalculate, your utilisation can drop down to 60% before you break even compared to PAYG price. Just exchange for a better number of nodes, or prepare for growth.

You will sometimes hit bizarre scenarios where your monthly utilisation is <100% but you still get charged overage for that VM size in your subscription. This is because reservations are hourly but the utilisation reporting is daily. Link here to the relevant doc.

We found this out because our Azure Databricks clusters would spin up a lot of VMs for 2 hours of the day to process batch jobs. This would spike the hourly utilisation to >100% where overages are charged at PAYG price.
Then, the rest of the day, the cluster would coast at <100%. Daily utilisation was 95% while we were still charged $25 per day for overages.

reservation-utilisation-list

Same with our Dev/Test environments which we turn off for the weekend. You cannot "accumulate" reservation hours over the weekend to then burst out to higher cores during the working week. The 2 weekend days were essentially wasted reservation compute. Still better than PAYG though.

reservation-utlisation-chart

Consider shared scope and billing subscription

You can choose how to scope your Reservations: to a resource group, subscription or every subscription (top-level management group).

Because our cost is all billed to 1 company we always choose shared scope. It allows me to reserve "All AKS nodes" across all squads and all subscriptions. I don't have to go to each team individually and ask them how many nodes they plan to use. I just reserve the total as-is. If one team scales down by 2 but another team needs more because they're growing - shared scope will make sure the utilisation stays at 100%. Otherwise, I would have to constantly review and exchange lots of little reservations.

This advice won't apply to everyone because you may have strict cost reporting needs per subscription and different teams cannot dip into each other's purse.

All reservations need a billing subscription. This might not be the subscription that the Reservation applies to but it is where the cost will be shown. We have a shared subscription for centralized bootstrap infra as per Azure CAF. This is where our "shared" cost goes, e.g. all Kubernetes nodes. Just be prepared to explain big cost spikes each month to bean counters.

cost-per-month

The other side-effect of shared subscription scope is that some subscriptions will have zero Virtual Machine cost despite having VMs. This is because the Reservation in shared sub will be applied to them, bringing their PAYG cost to zero.

đź’– đź’Ş đź™… đźš©
oskarm93
Oskar Mamrzynski

Posted on November 5, 2024

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related