Architecture Tenets, Production Readiness Reviews, and Technical Excellence

Image not Found

For digital companies, speed disproportionally matters. Many organizations are trying to improve their time-to-value–how long it takes to put an idea into customers’ hands. However, the product teams supporting these efforts often need help to maintain the technical excellence required in modern architectures that enable frequent releases of features. This challenge primarily exists because product teams need a structured approach to making architecture decisions and operations trade-offs.

Defining an architecture North Star help teams to work independently while aligning to organizational best practices

Architectural tenets help to coordinate development that expands across teams. Mature digital companies use mechanisms to align teams’ decisions with organizational best practices. It creates consistency across services in distributed architectures, helping the onboarding and mobility of developers between teams while driving technical and operational excellence.

Our Reliability Manifesto is a succinct collection of rules, guidelines, and best practices that reflect our current thinking on what it takes to build a reliable system.

The Delivery Hero Reliability Manifesto

Amazon Simple Storage Service launched with ten design tenets. The service teams grew their system from eight to more than two hundred fifty distributed services over the last fifteen years, with hundreds of developers constantly launching new features while providing 99.999999999% (11 9’s) of data durability.

S3 cross team design tenets

Similarly, Twilio followed a set of architectural design principles that helped them to sustain growth and to minimize the impact of occasional but inevitable issues in underlying infrastructure.

Twilio system design tenets

In the paper On Designing and Deploying Internet-Scale Services, James Hamilton describe the tenets and overall application design behind the Windows Live Services Platform.

Microsoft system design tenets

Establishing clear architectural guidance and operational guardrails for teams improves the overall system design quality and reduces the time teams need to make decisions.

Performing production readiness reviews helps teams to operate their services consistently

Many organizations are adopting the “you build it; you run it” principle to increase teams’ autonomy. However, teams will need a certain level of maturity before operating successfully. A production readiness review is a helpful mechanism to support teams preparing new services. Implemented as a questionnaire or checklist, it gives teams guidance on what to think about and consider before bringing a new service into production.

Production readiness reviews guide teams on what categories of service levels to think of, what organizational standards to comply with, and what documentation is required. Many organizations are using production readiness reviews as part of the go-live process, such as Grafana Labs or Gitlab, which has made publicly available their production readiness review plabybooks or Google that popularized this approach as part as their hand-off pager process in the site reliability engineering model

Grafana Labs production readiness review

For organizations concerned that a review process could negatively impact a team’s ability to go live, having a definition of production readiness can at least provide some guidance and document the agreed-upon criteria for the organization.

Using a Well-Architecture Framework drives the adoption of actionable architecture principles.

It’s easy for teams to lose sight of how to make decisions and trade-offs and get distracted by the nuances of everyday software delivery challenges. Organizations should define mechanisms to align teams, make decisions, and sustain technical excellence. A Well-Architected framework helps teams understand the pros and cons of their choices (including security, reliability, operational excellence, performance efficiency, cost optimization, and sustainability).

Cloud providers have published their well-architected frameworks on how best architect solutions on their platforms. Teams should use them as a starting point to develop their frameworks. In a series of articles, I am starting to explore how to create, adopt, and scale-up across the board architectural and operational principles using well-architected frameworks

Failures are a given, and everything will eventually fail over time

Werner Vogels, CTO of Amazon

If you have embarked on a similar journey, I would love to hear about it. Please reach out

comments powered by Disqus

You May Also Like