Automating Well-Architected Principles

This series explores patterns I have observed in big companies scaling up well-architected frameworks across the board:

Part 1: Adopting a Well-Architected Framework across the Organization
Part 2: Automating Well-Architected Principles
Part 3: Customizing a Well-Architected Framework
Part 4: Establishing a Well-Architected Central Hub

Well-architected frameworks help organizations build secure, cost-efficient, and reliable applications. However, manually checking every software change is time-consuming, and it quickly becomes apparent that it cannot scale up. Product teams are small and nimble, but they don’t have additional people to evaluate environments manually. Delegating the assessment to other functions will create friction in the release process. Therefore, companies must automate well-architected checks as part of the software delivery process.

Automating well-architected checks is all about repeatability. Automation and policy-as-code apply universal rigor throughout the organization. It solves the issue of human error, which is the common denominator across cloud incidents.

Nearly all successful attacks on cloud services are the result of customer misconfiguration, mismanagement and mistakes.
Neil MacDonald, Gartner

This article explores policy-as-code guardrails to prevent workloads from drifting from the organization’s well-architected principles. Guardrails are the technical implementation of the organization’s best practices and controls.

Three benefits of adopting well-architected guardrails

The first benefit is velocity. Success for modern development teams is measured by time-to-value, the time it takes for the team to develop and ship new features. Manual intervention to check non-functional requirements introduces friction in the delivery process that slows down teams.

The second benefit is risk reduction. In the cloud, teams provision infrastructure for their applications. In doing so, they’re defining the non-functional requirements of their applications (and, in many cases re-defining them daily). However, most organizations are not assessing non-functional requirements such as security, cost optimizations, and reliability beyond sampling workloads at a single point in time. Today, we shall automate the check of non-functional requirements. Making the process repeatable, testable, shareable, and scalable to any cloud environment of any size.

Finally, it increases autonomy in the organization. Implementing guardrails as code split the guardrail’s definition from the guardrail’s execution, reducing the overhead of human review processes. This separation of concerns allows teams in organizations such as security, finance, or compliance to devote more time thinking about other significant sources of risk and value.

Automation splits well-architected controls definition and execution

A cultural shift is needed among supporting functions and platform engineering teams. Instead of focusing on auditing, central functions and platform teams should provide guardrails as self-service products that allow product teams to deliver products safely and independently.

automating cycle

Integrating well-architected guardrails into every step in the product development cycle improve the developer experience and reduce friction

I have always believed in the speed and nimbleness of having autonomous teams. But I learned firsthand that the faster you grow, the more fragmented and complex your software becomes. And then everything slows down again. Providing guardrails streamlines development and governance end to end:

automating cycle

Automation is fundamental to provide feedback and enforcing controls at different phases, and it has implications for each stage of the service lifecycle:

Design. Teams assess their well-architected responsibilities from the inception of a new product using a well-architected tool. Where possible, teams utilize blueprints developed in collaboration with platform engineering teams, ensuring that best practices are observed. A self-service service catalog tool allows teams to discover blueprints and guardrails that accelerate their development
Develop. Developers continuously implement their services against the automated guardrails. Instead of having a specialist group scrutinize their changes, developers locally test their software against unit tests, linters, and policy-as-code checks to short the feedback loop to meet non-functional requirements such as encryption, availability, cost allocation, or observability
Build and test. Engineers use guardrail tests (such as policy-as-code and linters) to be run alongside automated functional and performance tests in their CI/CD pipeline. This ensures that testing is consistent and efficient and makes non-functional requirements explicit, so developers don’t waste time puzzling about how to satisfy ill-defined policies laid down by separate groups
Admission & Deploy. Changes are delivered to environments, not through manual processes but via well-engineered automated processes that ensure the right non-functional requirements are builtin and that it is deployed securely and reliably
Operate. Once the software has been provisioned, it is constantly monitored, assessed, and validated. Automation helps prevent drift in compliance and reduces the meantime to discovery (MTTD). If defects or vulnerabilities are discovered, resolutions are identified, prioritized, and tracked to ensure product non-functional requirements are constantly improved. Furthermore, the constant assessment of the cloud environments generates a fine-grain timeline useful for audit purposes.

A critical element is to have a shared self-service service catalog that enable teams to share and consume guardrails that are appropriate for their workloads.

automating cycle

Many breaches out there are not necessarily the result of an unknown risk, but are usually the result of some control that the organization thought they had not being deployed and operating when they needed it the most
Phil Venable, Chief Information Security Officer at Google Cloud

Learn how to implement guardrails as code

There are multiple ways to implement guardrails as code. I will not cover in this article the technical details, but I suggest the following resources:

Codify your best practices using service control policies. Part 1 and Part 2
How to automate Amazon EKS preventative controls in CI/CD using CDK and OPA/Conftest
Using OPA to create AWS Config rules
Using Open Policy Agent on Amazon EKS
Policy-driven continuous integration with Open Policy Agent
Building in compliance in your CI/CD pipeline with conftest

Conclusion

Many organizations have started to adopt infrastructure-as-code and DevOps as the defacto approach to achieve repeatable software builds. Security-as-code and DevSecOps aim to do the same thing for security. To build genuinely cross-functional teams that can own and run capability within a business, teams need to span their ownership beyond DevOps to other areas. Today, we can codify best practices such as compliance, cost-allocation, reliability, or auditing as code.

Well-architected best practices are moving from guidance to a competitive advantage. Automation and guardrails remove friction in the development, provisioning, and operation process. It allows teams to ship applications safely and quickly.

Security through code is all about repeatability. We implement automation and use of code for security purposes because it applies universal rigor throughout the organization. It solves the issue of human error that is the common denominator across cloud breaches.
Stephen Schmidt, Chief Security Officer at Amazon

These were some of my learnings automating well-architected principles in big organizations. For startups and small business the approach will be different. I would love to hear your experience automating well-architected principles. Please reach out via LinkedIn or Twitter