Portfolio GitHub

Blog

Appendix: Reliability (Failure Management) - AWS Well-Architected Framework Study Guide

aidutcher

Alec Dutcher

Posted on March 7, 2022

Appendix: Reliability (Failure Management) - AWS Well-Architected Framework Study Guide

Return to Well-Architected Framework Guide

Appendix: Reliability

How do you back up data?

Identify and back up all data that needs to be backed up, or reproduce the data from sources
Secure and encrypt backups
Perform data backup automatically
Perform periodic recovery of the data to verify backup integrity and processes

How do you use fault isolation to protect your workload?

Deploy the workload to multiple locations
Automate recovery for components constrained to a single location
Use bulkhead architectures to limit scope of impact

How do you design your workload to withstand component failures?

Monitor all components of the workload to detect failures
Fail over to healthy resources
Automate healing on all layers:
Use static stability to prevent bimodal behavior
Send notifications when events impact availability

How do you test reliability?

Use playbooks to investigate failures
Perform post-incident analysis
Test functional requirements
Test scaling and performance requirements
Test resiliency using chaos engineering
Conduct game days regularly

How do you plan for disaster recovery (DR)?

Define recovery objectives for downtime and data loss
Use defined recovery strategies to meet the recovery objectives
Test disaster recovery implementation to validate the implementation
Manage configuration drift at the DR site or region
Automate recovery

Return to Well-Architected Framework Guide

💖 💪 🙅 🚩

aidutcher

Posted on March 7, 2022

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related

If you're a beginner, definitely check this open source guide. I've explained almost everything you need to know.

undefined If you're a beginner, definitely check this open source guide. I've explained almost everything you need to know.

November 28, 2024

Exploring Typesafe design tokens in Tailwind 4

tailwindcss Exploring Typesafe design tokens in Tailwind 4

November 29, 2024

Where GitOps Meets ClickOps

devops Where GitOps Meets ClickOps

November 29, 2024

The best way to get better at writing code is...

development The best way to get better at writing code is...

November 28, 2024

How to Use KitOps with MLflow

beginners How to Use KitOps with MLflow

November 29, 2024