Zero-downtime infrastructure
This document describes the infrastructure building blocks that enable zero-downtime operations for the Provider Data API (legacy). These building blocks can be combined to achieve various zero-downtime scenarios such as cache rotation and database switchover.
Overview
The Provider Data API (legacy) is designed to support zero-downtime maintenance operations through a combination of:
- Dual Helm releases (stable and canary) that can run simultaneously
- Canary-weight traffic splitting via NGINX ingress canary annotations
- Per-release secrets for independent database/configuration
- Configurable cache key prefixes to isolate cache namespaces
These building blocks allow operators to prepare a new configuration (cache, database snapshot, etc.) on one release while the other continues serving production traffic, then switch traffic seamlessly.
Building block 1: Dual Helm releases
How it works
The Helm chart supports deploying two independent releases into the same namespace:
| Release Name | Role | Hostname Suffix | Data Secret |
|---|---|---|---|
providers-app |
stable (primary) | -1 |
app-secrets |
pdl-2 |
canary (secondary) | -2 |
app-secrets-secondary |
Each release creates its own:
- Deployment (with separate pods)
- Service
- Dedicated ingress (with hostname suffix, e.g.,
*-uat-1,*-uat-2)
Both releases also participate in the shared “PDA” ingress for traffic splitting.
Helm values configuration
The canary.role value determines how each release participates (it is set in the values.yaml
files that are deployed with the Helm chart).
# values.yaml excerpt
canary:
# Role of this release in canary deployment:
# - none: No shared ingress (dedicated ingress only)
# - stable: Base ingress for canary split (receives [100 - canary-weight]%)
# - canary: Canary ingress for traffic testing (receives [canary-weight]%)
role: "none"
# Percentage of traffic to route to canary release (0-100)
weight: 0
Deploy workflow configuration
The GitHub Actions workflow rw-pdl-deploy-main.yml automatically determines the role based on
the Helm release name that is to be deployed.
if [[ "${RELEASE}" == "${DEFAULT_RELEASE}" ]]; then
# Primary release (providers-app)
RELEASE_SUFFIX="-1"
CANARY_ROLE="stable"
DATACFG_SECRET="app-secrets"
elif [[ "${RELEASE}" == "${SECONDARY_RELEASE}" ]]; then
# Secondary release (pdl-2)
RELEASE_SUFFIX="-2"
CANARY_ROLE="canary"
DATACFG_SECRET="app-secrets-secondary"
fi
Deploying a secondary release
To deploy the secondary/canary release manually, use the GitHub Actions workflow dispatch, which
includes something like the below, and name the release as pdl-2
# Via GitHub Actions workflow dispatch with rel=pdl-2
helm upgrade --install pdl-2 helm_deploy/providers-app \
-f helm_deploy/providers-app/values-uat.yaml \
--set-string "canary.role=canary" \
--set-string "canary.weight=0" \
--set-string "releaseSuffix=-2" \
--set-string "secretNames.dataConfig=app-secrets-secondary" \
-n laa-data-provider-data-uat
Building block 2: Traffic splitting with canary ingress
How it works
The shared “PDA” ingress (ingress-pda.yaml) uses NGINX canary annotations to split traffic
between the stable and canary releases.
- The stable release creates the base ingress (receives
[100 - canary_weight]%) - The canary release creates a canary ingress (receives
[canary_weight]%)
# ingress-pda.yaml excerpt
{{- if eq (.Values.canary).role "canary" }}
nginx.ingress.kubernetes.io/canary: "true"
nginx.ingress.kubernetes.io/canary-weight: "{{ (.Values.canary).weight }}"
{{- end }}
Switching traffic with kubectl
You can change traffic distribution without redeploying by patching the canary ingress annotation:
# Send 0% to canary (all traffic to stable)
kubectl -n laa-data-provider-data-uat patch ingress providers-app-2-pda \
-p '{"metadata":{"annotations":{"nginx.ingress.kubernetes.io/canary-weight":"0"}}}'
# Send 10% to canary (for testing)
kubectl -n laa-data-provider-data-uat patch ingress pdl-2-pda \
-p '{"metadata":{"annotations":{"nginx.ingress.kubernetes.io/canary-weight":"10"}}}'
# Send 100% to canary (full switchover)
kubectl -n laa-data-provider-data-uat patch ingress pdl-2-pda \
-p '{"metadata":{"annotations":{"nginx.ingress.kubernetes.io/canary-weight":"100"}}}'
Building block 3: Per-release secrets
How it works
Each release can read database credentials and configuration from different Kubernetes secrets.
This is configured via the secretNames.dataConfig Helm value:
| Release | Secret Name | Purpose |
|---|---|---|
providers-app (stable) |
app-secrets |
Primary database connection |
pdl-2 (canary) |
app-secrets-secondary |
Secondary/alternative database connection |
Deployment template
The deployment.yaml template references the secret name dynamically
# deployment.yaml excerpt
env:
- name: CWA_DB_URL
valueFrom:
secretKeyRef:
name: {{ (.Values.secretNames).dataConfig }}
key: CWA_DB_URL
- name: CWA_DB_USER
valueFrom:
secretKeyRef:
name: {{ (.Values.secretNames).dataConfig }}
key: CWA_DB_USER
# ... similar for CWA_DB_PASSWORD, CCMS_DB_* values
Creating the secondary secret
Before deploying the canary release with different database configuration, create the secondary secret with alternative database connection details in it. For example, using AWS Console, which will then synchronize to the Cloud Platform Kubernetes secret.
Building block 4: Configurable cache key prefixes
How it works
The application uses Redis for caching, with a configurable prefix on all cache keys in
application.yml. This allows multiple “namespaces” of cached data to coexist in the same Redis
instance.
# application.yml excerpt
app:
cache:
prefix:
allowed: ",b,g,p" # Empty string, 'b', 'g', 'p' are valid prefixes
key: "primary" # Redis key storing the active prefix
Cache key format
Cache keys are computed by ComputedCacheKeyPrefix:
| Active Prefix | Cache Name | Resulting Key Pattern |
|---|---|---|
| (empty) | ProviderFirms |
ProviderFirms::* |
b |
ProviderFirms |
b::ProviderFirms::* |
g |
ProviderFirms |
g::ProviderFirms::* |
How the active prefix is determined
- The active prefix is stored in Redis under the key configured by
app.cache.prefix.key(default:primary) - Each pod queries Redis every 30 seconds to refresh the cached active prefix
- If no value is set, the first value in
app.cache.prefix.allowedis used as default
Admin API endpoints
The application provides admin endpoints to manage cache prefixes:
# Get the currently active prefix
curl -X GET "https://api-host/admin/cache/prefix" \
-H "X-Authorization: Bearer $TOKEN"
# Set the active prefix (all pods will use this within 30 seconds)
curl -X POST "https://api-host/admin/cache/prefix?prefix=b" \
-H "X-Authorization: Bearer $TOKEN"
# Force cache reload into a specific prefix (without changing active)
curl -X POST "https://api-host/admin/cache/force-reload?reason=manual&prefix=g" \
-H "X-Authorization: Bearer $TOKEN"
# Clear cache for a specific prefix
curl -X POST "https://api-host/admin/cache/clear?reason=testing&prefix=b" \
-H "X-Authorization: Bearer $TOKEN"
Cache status
Check the status of a cache namespace:
curl -X GET "https://api-host/admin/cache/status?prefix=b" \
-H "X-Authorization: Bearer $TOKEN"
Summary of building blocks
| Building Block | Mechanism | Controlled By |
|---|---|---|
| Dual releases | Helm release names (providers-app, pdl-2) |
GitHub workflow rel input |
| Traffic splitting | NGINX canary ingress annotation | kubectl patch |
| Per-release config | Kubernetes secrets per release |
secretNames.dataConfig Helm value |
| Cache isolation | Redis key prefix | Admin API or Redis directly |
Next steps
- Zero-downtime cache rotation - How to reload the cache without service interruption
- Zero-downtime database switchover - How to switch between database snapshots