As businesses move more of their processes to the cloud, it becomes crucial to adhere to best practices for security, performance, or cost optimization. Recognizing this need, public cloud providers have provided frameworks to assist partners and customers in evaluating architecture designs against important criteria such as efficiency and reliability.
In this article, Sebastian Gavril, Engineering Lead at Levi9, takes a closer look at Amazon Web Services’ (AWS) Well-Architected Framework, which provides customers with a consistent approach for measuring their architectures against best practices and identifying areas for improvement.
What is the Well-Architected Framework?
The Well-Architected Framework provides a set of pillars – foundational concepts – that relate to critical aspects customers should consider when building cloud architectures. “It’s an architectural framework for designing and running workloads in the cloud”, explains Sebastian Gavril. He explained that a workload could encompass anything from a simple “to-do list” application to a complex e-commerce platform.
The framework aims to guide companies in implementing best practices across six key “pillars”: Operational excellence, Security, Reliability, Performance efficiency, Cost optimization, and Sustainability. These pillars provide design principles, best practices, and questions to ask to determine how well an architecture is aligned.
The evolution of the Well-Architected Framework
The idea of Well-Architected originated in 2012 when AWS experienced a major outage that impacted many customers – but not all. “In 2013, a team of AWS solution architects investigated why some customers were affected by the outage while others continued business as usual. They noticed the group not impacted was doing certain things in a particular way,” notes Sebastian.
AWS formalized their findings into the first version of the Well-Architected Framework in 2014, originally consisting of just the first four pillars (Operational Excellence, Security, Reliability and Performance Efficiency). Amazon kept adding to the framework over the next few years, releasing Cost Optimization in 2016 and Sustainability in 2021. They also developed the Well-Architected Framework tool to facilitate reviews in 2018.
Sebastian stresses that the Well-Architected Framework reflects emerging industry consensus, not just the opinions of AWS. Five of the pillars – Operational Excellence, Security, Reliability, Performance Efficiency and Cost Optimization – are common across all major providers, which include not just AWS but also Azure and the Google Cloud Platform.
6 pillars of the Well-Architected Framework
Operational excellence
The first pillar focuses on the operational aspects of running cloud workloads efficiently. “It means the ability to run, administer, and monitor systems that add value,” Sebastian explains. One example of a design principle in this pillar is the ability to anticipate failures before they occur. “No one deploys code to production, hoping nothing bad happens.” Proactively monitoring for failures allows companies to achieve operational excellence.
Security
This critical pillar focuses on system and data security. One of its guiding principles may seem counterintuitive. Sebastian sums it up nicely: “Keep people away from data.” Rather than relying on error-prone manual processes, people in companies should manage data through automated tools and systems with proper access control policies.
Reliability
Reliability emphasizes building applications that both “perform the intended functions and quickly recover to meet changing demands,” as Sebastian puts it. A best practice under this pillar is to automatically recover from failure, thereby avoiding business disruption.
Performance efficiency
This pillar deals with getting optimal performance from cloud infrastructure. One of its design principles involves democratizing access to advanced technologies. Sebastian explains that companies should leverage managed services whenever possible rather than spend precious time on lower-level infrastructure management.
Cost optimization
As cloud platforms have variable pricing, cost optimization requires continuously monitoring expenditure and right-sizing usage to meet business needs. Well-Architected Frameworks include, for example, an organizational role for cloud financial management, that blends business and technical acumen with financial concerns.
Sustainability
As the most recent pillar, sustainability focuses on minimizing the environmental impacts of cloud usage. Companies should architect solutions that avoid downstream waste. As one example, Sebastian said that software updates should not force customers to discard still-functional devices.
Lenses for focused analysis
In addition to the standard framework, AWS offers different “lenses” tailored to various types of workloads, like serverless applications and machine learning systems. “Lenses allow you to look at a workload from a particular perspective. For example, if you have a serverless workload, some standard questions like ‘how do you patch servers’ don’t make sense since there are no servers!” says Sebastian.
Some of the lenses available beyond the default include Healthcare Industry, IoT, Data Analytics and many others. Further, AWS customers and partners can create custom lenses tailored to their industry or based on an internal workload classification system.
Why conduct Well-Architected reviews?
Sebastian learned a precious lesson about conducting reviews as early as possible when building his own house. “I simply thought a balcony would be nice. But I did not consider the time we’d spend cleaning it, or I might have skipped the balcony. That was easier to do in the project phase; it’s much more difficult now.” This is why he advises that “it’s better to review sooner rather than later.”
There are three compelling reasons why a review may be the best option:
- Identify issues early: It’s much easier to address gaps in reliability, security, etc. if they are spotted early in the development process.
- Most workloads can be improved: Few companies score perfectly across all pillars. Reviews uncover areas needing enhancement.
- Credits for customers: Customers can earn up to $5000 in AWS credits by fixing high-priority issues uncovered during official reviews.
How to conduct Well-Architected reviews
To properly conduct a Well-Architected review, Sebastian emphasizes that it should not be an audit-focused exercise. “You don’t have yes-or-no answers. We try to have an honest, constructive discussion.”
The team involved should include at least a technical lead and a business-minded team member. Levi9’s Well-Architected consultants can also provide guidance for framing questions and suggestions for best practice guidance. Unlike an audit, the purpose is not to pass with flying colors but rather to identify potential risks or improvements that should then be tracked and addressed.
Additional resources for Well-Architected Framework
The Well-Architected review demo
You can use the Well-Architected Framework tool both to review your cloud applications and as a “reality check” on how robust your cloud architecture is. Here is a brief description of what you can expect during a Well-Architected review.
1. Define a workload
Go to AWS Well-Architected Framework and define a workload by giving it a name and description. For example, Work From Office Application.
Select attributes like owners, regions, accounts, etc.
2. Activate Trusted Advisor
Within the Well-Architected Framework tool, activate the AWS Trusted Advisor. This will integrate recommendations from Trusted Advisor into some of the Well-Architected questions.
3. Apply relevant lenses
The “Well-Architected Framework” lens is selected by default, providing the core Well-Architected questions.
For this demo, we select another lens, such as “Serverless,” so we can see how serverless-specific questions apply here.
4. Answer questions
Each of the six pillars, like security, reliability, etc., has a set of questions you must answer. For example, one question in the Cost Optimization pillar asks, “How do you decommission resources?”. On questions like this, answers might be dependent on each other. For example, if you answer that you Implement a decommissioning process, you also need to have Track resources over their life time selected. You can also choose none of the options and skip most of the questions, in which case Sebastian warns that your scores will be very low.
5. Use Trusted Advisor for a reality check
Some questions integrate with Trusted Advisor. This integration checks your answers against actual configurations.
As an example, let’s take a look at the 5th question in the Security pillar. The question here is, “How do you protect your network resources?” If you chose Control traffic at all layers, you have the option to activate the Trusted Advisor integration and check the answer against reality. The tool might find some vulnerabilities that you were not aware of and point out that you have certain security groups for which traffic is not controlled. The integration alerts you to where you are missing insight about your cloud workload.
6. Apply lenses
If we apply a custom lens, such as Serverless, we also get a series of questions focused on serverless apps. One such example would be “How do you build resiliency into your serverless application?”— a question that only makes sense in this particular case.
7. Use recommendations
Upon completion, the tool provides a visualization of the medium and high risks for each pillar. In Sebastian’s experience, the reliability pillar is the most riddled with risks. However, the tools also provide recommendations and insights to improve the workload’s alignment with best practices based on your answers.
By implementing the Well-Architected Framework early in the process, companies can feel confident that their cloud-based applications meet the highest standards for security, reliability and operational excellence. To prove his point, Sebastian likes to quote Jeff Bezos on this: “Good intentions never work. You need good mechanisms to make anything happen.”