A handful of practices to minimize application configuration issues
As an aftermath of our last production incident, here are a bunch of application configuration practices that have worked well enough for me over the years:
- Version controlled.
- Restricted secrets.
- Default to production.
- Fail fast.
- No logic.
- Staging == production.
- Avoid “isDevEnv?”
- Preconfigured local dev.
Same as code and infrastructure, configuration should be version controlled, so that you know who, when, how, why and what, and changes can be easily reviewed and rolled back.
Ticketing systems to change configuration in production? Meh.
The team owning the service is already making any code changes that they want, so allowing them to change some configuration is not riskier.
If you really need gatekeeping, the owning team should be able to create pull requests with the required changes for the gatekeepers to approve/merge, reducing the chances of miscommunication.
Secrets that everybody can read are no secrets. Reduce your risks by restricting who can see and change which secrets.
Default to production
When adding some new configuration, always set the default to the production value.
This will avoid the most common issue with configuration: “Oh, the app does not start because we forgot to set the configuration for …”
And never ever default to some dev or testing value. Never ever.
Obviously, only default if it is safe. You don’t want to be sending emails to your real clients, connecting to your production data stores and such if some dev forgets to configure their local environment properly.
If not safe, the safe value is no default. In this case, add the production configuration as soon as possible. Do not wait for the code release.
Validate the configuration at start time, and crash the application if any value is missing or it has the wrong type.
Avoid any logic related to manipulating configuration, like string manipulation, url building or splitting.
It is less likely that you are be able to test it, and if the logic fails in production it will be harder to find out, especially for secret values.
Staging == production
The principle to make staging as similar as production includes configuration.
Especially important if you broke the “no logic” principle.
All config usually start with a simple “environment” configuration variable, that will sprinkle your codebase with “isDevEnv?” conditionals.
But soon enough, your app will be run in new environments (CI, test, staging, demo, training, load testing, hotbugfix-33, KafkaPoC, other team’s local dev) and it will not be so clear what is dev and what is not.
So prefer explicit configuration variables to avoid implicit coupling of a bunch of behaviors to one configuration variable.
Preconfigured local dev
A big chunk of your configuration exists just for the local development environment, so given that most configuration is going to default to production values, it is important to make it easy to set up, add and update that configuration in every developer’s box.
My preferred approach is still a dockerized local environment.
Less config, less chance for bugs. Same as code! YAGNI!
In case you wonder, in our production incident we committed three sins: defaulted to a dev value, logic in config, and staging different from production.
Why sin once when you can do it thrice for the same price?