Skip to content

Architecture

Workloads following modern architecture styles (e.g. microservices) typically comprises multiple components. Each component has configuration that controls how the component behaves, which could be either mechanisms to control which features are enabled per environment to decouple release from deployment (e.g. Feature Flags) or operational configuration (e.g. log level, throttling thresholds, connection/request limits, alerts, notifications).

The dynamic configuration pipeline enables teams to manage configuration for an entire workload and all its components in all environments as code so that all configurations are tracked in a code versioning system and can follow the common code review/approval process (e.g. pull/merge requests). The dynamic configuration pipeline can roll out configuration changes in a progressive and safe way to ensure that configuration changes do not break the workload in any environment.

Dynamic Configuration Pipeline Architecture

Local Development

Developers need rapid feedback to make them aware of potential issues with their code. Automation should run in their development environment to give them feedback before the deployment pipeline runs.

Pre-Commit Hooks

Pre-Commit hooks are scripts that are executed on the developer's workstation when they try to create a new commit. These hooks have an opportunity to inspect the state of the code before the commit occurs and abort the commit if tests fail. An example of pre-commit hooks are Git hooks. Examples of tools to configure and store pre-commit hooks as code include but are not limited to husky and pre-commit.

Source

The source stage pulls in various types of code from a distributed version control system such as Git.

Infrastructure Source Code

Code that defines the infrastructure necessary to host the Dynamic Configuration. Examples of infrastructure source code include but are not limited to AWS Cloud Development Kit, AWS CloudFormation and HashiCorp Terraform. All Infrastructure Source Code is required to be stored in the same repository as the dynamic configuration definitions to allow infrastructure to be created and updated on the same lifecycle as the Dynamic Configuration.

Feature Flag definitions

Workload features often span multiple components and components can have dependencies on each other. Traditionally, features are released by deploying the corresponding code changes. As different components are owned by different teams, releasing features that span multiple components traditionally requires coordination between those teams to ensure that components' interdependencies are satisfied. The more teams' components require changes to implement a workload's feature, the more complex the coordination effort to safely and consistently release the feature becomes. This results in longer lead times, lower deployment frequency, and higher change failure rates.

Additionally, modern teams who aim to achieve continuous deployment (i.e. deploy to production in full automation without any manual approvals) prefer trunk-based development, which means that a single branch manages the code that is deployed to all environments. While trunk-based development is a great way to achieve continuous deployment, it, by design, does not allow excluding code from being deployed to production, which in turn makes it challenging to coordinate the timing of releasing cross-component features consistently and safely.

An effective way to solve this problem is to use Feature Flags to separate release from deployment. A Feature Flag is a mechanism that allows teams to enable/disable certain code fragments using a configuration item that is managed outside the codebase it is used in. In its simplest form, a Feature Flag has a name and a boolean value that is used in an if/else statement. For features that span multiple components, the corresponding feature flag can be used in all components. By wrapping code changes for a new feature in a statement that only executes the new code when the feature flag is turned on and continues to execute the old code when the feature flag is turned off, deploying the code does not release the feature, resulting in what is also called a "dark release". Releasing features is done by turning the corresponding feature flag on. This allows multiple teams to deploy changes to their components continuously and independently of each other, while features are released safely using feature flags. In case a feature turns out to be broken, the rollback procedure does not require deploying the previous version of the code but instead, the rollback procedure is as simple as turning the feature flag off.

Some examples for the use of feature flags are: introducing new functionality to an existing workload, enabling or disabling functionality on a given workload without the need for re-deployments or restarts. For more details see: Using AWS AppConfig Feature Flags. Feature Flags are managed per environment, which allows releasing features for different environments independently. Feature Flags can be stored in any format, including but not limited to YAML, JSON, and XML.

Operational Configuration definition

Code that defined Operational Configuration that is deployed and managed by the Dynamic Configuration Pipeline. Operational Configuration is managed per environment, which allows to configure environments independently from each other. As an example, the default log level may be set to DEBUG in test environments whereas production environments are set to ERROR to only capture errors in the logs. Using the Dynamic Configuration Pipeline, Operational Configuration can be changed on the fly without redeploying the respective workload. Operational Configuration can be stored in any format, including but not limited to YAML, JSON, and XML.

Build

All actions run in this stage are also run on developer's local environments prior to code commit and peer review. Actions in this stage should all run in less than 10 minutes so that developers can take action on fast feedback before moving on to their next task. If it’s taking more time, consider decoupling the system to reduce dependencies, optimizing the process, using more efficient tooling, or moving some of the actions to latter stages. Each of the actions below are defined and run in code.

Build Code

Convert code into artifacts that can be promoted through environments. Most builds complete in seconds. While the Dynamic Configuration Pipeline does not house application source code that needs to be built, it may contain Infrastructure Source Code that needs to be built, e.g. CDK Source Code.

Unit Tests

Run the unit tests to verify that the infrastructure as code (IaC) complies with the specification expressed as unit tests to avoid unintended changes to the infrastructure. These tests are fast-running tests with zero dependencies on external systems returning results in seconds. In the case of CDK, unit tests are expressed in commonly used programming language-specific unit test frameworks, including but are not limited to JUnit, Jest, and pytest. Other IaC technologies such as Terraform have other unit test frameworks and mechanisms that can be leveraged to ensure that IaC is in line with the specification. Test results should be published as artifacts such as AWS CodeBuild Test Reports.

Code Quality

Run various automated static analysis tools that generate reports on code quality, coding standards, security, code coverage, and other aspects according to the team and/or organization’s best practices. AWS recommends that teams fail the build when important practices are violated (e.g., a security violation is discovered in the code). These checks usually run in seconds. Examples of tools to measure code quality include but are not limited to Amazon CodeGuru, SonarQube, black, and ESLint.

Secrets Detection

Identify secrets such as usernames, passwords, and access keys in code. When discovering secrets, the build should fail immediately. Examples of secret detection tools include but are not limited to GitGuardian and gitleaks.

Static Application Security Testing (SAST)

Analyze code for application security violations such as XML External Entity Processing, SQL Injection, and Cross Site Scripting. Any findings that exceed the configured threshold will immediately fail the build and stop any forward progress in the pipeline. Examples of tools to perform static application security testing include but are not limited to Amazon CodeGuru, SonarQube, and Checkmarx.

Test (Beta)

Testing is performed in a beta environment to validate that the latest code is functioning as expected. This validation is done by first deploying the code and then running integration and end-to-end tests against the deployment. Beta environments will have dependencies on the applications and services from other teams in their gamma environments. All actions performed in this stage should complete within 30 minutes to provide fast-feedback.

Deploy Feature Flags

Deploy Feature Flags to the beta environment. Software deployments should be performed through Infrastructure Source Code. Access to the beta environment should be handled via cross-account IAM roles rather than long lived credentials from IAM users. Examples of tools to define feature flags include but are not limited to: AWS AppConfig, Split.io and LaunchDarkly.

Deploy Operational Configuration

Deploy Operational Configurations to the beta environment. Software deployments should be performed through Infrastructure Source Code. Access to the beta environment should be handled via cross-account IAM roles rather than long lived credentials from IAM users. Examples of tools to define operational configurations include but are not limited to: AWS AppConfig, Split.io and LaunchDarkly.

Integration Tests

Run automated tests that verify if the application satisfies business requirements. These tests require the workload to be running in the beta environment. Integration tests may come in the form of behavior-driven tests, automated acceptance tests, or automated tests linked to requirements and/or stories in a tracking system. Test results should be published somewhere such as AWS CodeBuild Test Reports. Examples of tools to define integration tests include but are not limited to Cucumber, vRest, and SoapUI.

Test (Gamma)

Testing is performed in a gamma environment to validate that the latest code can be safely deployed to production. The environment is as production-like as possible including configuration, monitoring, and traffic. Additionally, the environment should match the same regions that the production environment uses. The gamma environment is used by other team's beta environments and therefore must maintain acceptable service levels to avoid impacting other team productivity. All actions performed in this stage should complete within 30 minutes to provide fast-feedback.

Deploy Feature Flags

Deploy Feature Flags to the gamma environment. Software deployments should be performed through Infrastructure Source Code. Access to the gamma environment should be handled via cross-account IAM roles rather than long lived credentials from IAM users.

Deploy Operational Configuration

Deploy Operational Configurations to the gamma environment. Software deployments should be performed through Infrastructure Source Code. Access to the gamma environment should be handled via cross-account IAM roles rather than long lived credentials from IAM users.

Integration Tests

Run automated tests that verify if the application satisfies business requirements. These tests require the workload to be running in the gamma environment. Integration tests may come in the form of behavior-driven tests, automated acceptance tests, or automated tests linked to requirements and/or stories in a tracking system. Test results should be published somewhere such as AWS CodeBuild Test Reports. Examples of tools to define integration tests include but are not limited to Cucumber, vRest, and SoapUI.

Acceptance Tests

Run automated testing from the users’ perspective in the gamma environment. These tests verify the user workflow, including when performed through a UI. These test are the slowest to run and hardest to maintain and therefore it is recommended to only have a few end-to-end tests that cover the most important application workflows. Test results should be published somewhere such as AWS CodeBuild Test Reports. Examples of tools to define end-to-end tests include but are not limited to Cypress, Selenium, and Telerik Test Studio.

Monitoring & Logging

Monitor deployments across regions and fail when threshold breached. The thresholds for metric alarms should be defined in the Infrastructure Source Code and deployed along with the rest of the infrastructure in an environment. Ideally, deployments should be automatically failed and rolled back when error thresholds are breached. Examples of automated rollback include AWS CloudFormation monitor & rollback, AWS CodeDeploy rollback and Flagger.

Synthetic Tests

Tests that run continuously in the background in a given environment to generate traffic and verify the system is healthy. These tests serve two purposes:

  1. Ensure there is always adequate traffic in the environment to trigger alarms if a deployment is unhealthy
  2. Test specific workflows and assert that the system is functioning correctly.

Examples of tools that can be used for synthetic tests include but are not limited to Amazon CloudWatch Synthetics,Dynatrace Synthetic Monitoring, and Datadog Synthetic Monitoring.

Prod

Progressively deploy Feature Flags and Operational Configuration

Deployments should be made progressively in waves to limit the impact of failures. A common approach is to deploy changes to a subset of AWS regions and allow sufficient bake time to monitor performance and behavior before proceeding with additional waves of AWS regions.

Software should be deployed using one of progressive deployment involving controlled rollout of a change through techniques such as canary deployments, feature flags, and traffic shifting. Software deployments should be performed through Infrastructure Source Code. Access to the production environment should be handled via cross-account IAM roles rather than long lived credentials from IAM users. Examples of tools to deploy software include but are not limited to AWS CodeDeploy. Ideally, deployments should be automatically failed and rolled back when error thresholds are breached. Examples of automated rollback include AWS CloudFormation monitor & rollback, AWS CodeDeploy rollback and Flagger.

Acceptance Tests

Run automated testing from the users’ perspective in the beta environment. These tests verify the user workflow, including when performed through a UI. These test are the slowest to run and hardest to maintain and therefore it is recommended to only have a few end-to-end tests that cover the most important application workflows. Test results should be published somewhere such as AWS CodeBuild Test Reports. Examples of tools to define end-to-end tests include but are not limited to Cypress, Selenium, and Telerik Test Studio.

Monitoring & Logging

Monitor deployments across regions and fail when threshold breached. The thresholds for metric alarms should be defined in the Infrastructure Source Code and deployed along with the rest of the infrastructure in an environment. Ideally, deployments should be automatically failed and rolled back when error thresholds are breached. Examples of automated rollback include AWS CloudFormation monitor & rollback, AWS CodeDeploy rollback and Flagger.

Synthetic Tests

Tests that run continuously in the background in a given environment to generate traffic and verify the system is healthy. These tests serve two purposes:

  1. Ensure there is always adequate traffic in the environment to trigger alarms if a deployment is unhealthy
  2. Test specific workflows and assert that the system is functioning correctly.

Examples of tools that can be used for synthetic tests include but are not limited to Amazon CloudWatch Synthetics,Dynatrace Synthetic Monitoring, and Datadog Synthetic Monitoring.