Enterprise Secrets Management with SOPS: A Production-Ready Implementation
How to build a scalable, secure, and auditable secrets management system using SOPS, age encryption, and GitHub Actions
📦 Full implementation available: All code, scripts, and examples from this post are available at github.com/cgoolsby/sops-for-companies
Introduction
Managing secrets in a corporate environment is one of those challenges that keeps security engineers up at night. You need to balance security with usability, ensure proper access control, maintain audit trails, and make it all work seamlessly with your CI/CD pipelines. After implementing secrets management systems at several organizations, I’ve found that SOPS (Secrets OPerationS) combined with age encryption provides an elegant solution that checks all these boxes.
In this post, I’ll walk you through building a production-ready secrets management system that handles employee onboarding/offboarding, role-based access control, automated secret rotation, and GitHub Actions integration. We’ll use YAML anchors for efficient configuration management and implement comprehensive audit logging throughout.
Why SOPS?
Before diving into the implementation, let’s address why SOPS stands out among the numerous secrets management solutions available:
The Problem Space
Traditional approaches to secrets management often fall into these traps:
- Plain text in environment variables: Zero security, high risk
- Encrypted files with shared passwords: Poor access control, no audit trail
- Cloud provider secret stores: Vendor lock-in, complex local development
- HashiCorp Vault: Excellent but requires infrastructure and operational overhead
Enter SOPS
SOPS hits the sweet spot by:
- Encrypting values, not keys: You can still see the structure of your configuration
- Git-friendly: Encrypted files are text-based and diff-able
- Multi-cloud support: Works with AWS KMS, GCP KMS, Azure Key Vault, age, and GPG
- Minimal infrastructure: No servers to maintain
- Developer-friendly: Integrates with existing workflows
Why age Over GPG?
While SOPS supports multiple encryption backends, we chose age for several reasons:
# age key generation is simple
$ age-keygen -o key.txt
Public key: age1ql3z7hjy54pw3hyww5ayyfg7zqgvc7w3j2elw8zmrj2kg5sfn9aqmcac8p
# Compare to GPG's complexity
$ gpg --full-generate-key # Multiple prompts, expiry dates, subkeys...
Age advantages:
- Simplicity: No key expiry, web of trust, or subkeys to manage
- Modern cryptography: Uses modern algorithms by default
- Small key format: Keys are just text strings
- No agent required: No gpg-agent or keyring complexities
Architecture Design
Role-Based Access Model
Our implementation defines three distinct roles with different access levels:
# Simplified view of our access matrix
developers:
- development: ✅
- staging: ❌
- production: ❌
- examples: ✅ (testing)
administrators:
- development: ✅
- staging: ✅
- production: ✅
- examples: ✅
ci_cd:
- development: ✅ (deploy)
- staging: ✅ (deploy)
- production: ✅ (deploy with approval)
This model follows the principle of least privilege while maintaining operational efficiency.
The Power of YAML Anchors
One of the clever features of our implementation is using YAML anchors in .sops.yaml
to create reusable key groups. Here’s how it works:
# Define reusable key groups using YAML anchors
keys:
developers: &developers
- &alice_key age1ql3z7hjy54pw3hyww5ayyfg7zqgvc7w3j2elw8zmrj2kg5sfn9aqmcac8p
- &bob_key age1cy0su9fwf8gzkdqh3r4r6xgc92fp8jqrjp4fvd4ak6vd3mc0jjpqnhymkw
administrators: &administrators
- &admin1_key age1yx3z8r0hnzjy9wh6fq5gldq3p7hxg6nfkz5vgqcdqhsj8tqxj8xq8w6qur
- &admin2_key age1gwehlaawmwzqnn43gp0q6mmxxw8vj64sxz3jc85c2g0lyq5cy3kqpkjz9g
# Reference anchors in creation rules
creation_rules:
- path_regex: secrets/dev/.*\.enc\.yaml$
key_groups:
- age:
- *alice_key # Developer keys referenced individually
- *bob_key
- *admin1_key # Plus admin keys
- *admin2_key
This approach provides several benefits:
- DRY principle: Define each key once
- Clear organization: Group keys by role
- Easy updates: Change a key in one place
- Reduced errors: No copy-paste mistakes
Directory Structure
Our repository structure reflects the security boundaries:
sops-for-companies/
├── .sops.yaml # Central configuration
├── keys/
│ ├── developers/ # Public keys only
│ ├── administrators/
│ └── ci/
├── secrets/
│ ├── dev/ # Environment isolation
│ ├── staging/
│ └── production/
├── scripts/ # Automation tooling
└── .github/workflows/ # CI/CD integration
Implementation Deep Dive
Employee Onboarding Automation
The onboarding script (scripts/onboard.sh
) automates the entire process of adding a new employee:
#!/usr/bin/env bash
# Simplified excerpt from onboard.sh
# Script supports both interactive and non-interactive modes
# Interactive mode:
read -p "Enter employee name: " employee_name
read -p "Select role (1=Developer, 2=Administrator): " role_choice
# 2. Handle key provisioning
echo "1) Enter existing age public key"
echo "2) Generate new age keypair"
read -p "Choice: " key_choice
if [ "$key_choice" == "2" ]; then
# Generate new keypair
keypair_output=$(age-keygen 2>&1)
public_key=$(echo "$keypair_output" | grep "Public key:" | cut -d' ' -f3)
private_key=$(echo "$keypair_output" | grep "AGE-SECRET-KEY")
echo "IMPORTANT: Save this private key securely:"
echo "$private_key"
fi
# 3. Add key to .sops.yaml using awk for reliable pattern matching
# This function handles YAML anchors correctly unlike sed
add_key_to_config "$employee_name" "$public_key" "$role"
# 4. Re-encrypt all secrets with new key
find secrets -name "*.enc.yaml" | while read -r secret; do
sops updatekeys -y "$secret"
done
# 5. Commit with audit trail
git add .sops.yaml keys/
git commit -m "chore: onboard $employee_name as $role"
# Non-interactive mode example:
# ./onboard.sh --name alice --role developer --key age1... --non-interactive
Key features:
- Interactive and CLI modes: Supports both manual and automated workflows
- Key generation: Optional automatic keypair creation
- Automatic re-encryption: All secrets updated with new access
- Git integration: Changes tracked with meaningful commits
- Cross-platform compatibility: Works on macOS, Linux, and WSL
Employee Offboarding with Security
Offboarding (scripts/offboard.sh
) is even more critical from a security perspective:
# Key offboarding steps
# 1. Identify affected secrets
accessible_secrets=$(find_employee_accessible_secrets "$employee_name")
# 2. Remove key from configuration
remove_key_from_config "$employee_name"
# 3. Re-encrypt without removed key
for secret in $all_secrets; do
sops updatekeys -y "$secret"
done
# 4. Optional secret rotation for critical environments
if [[ "$rotate_choice" == "yes" ]]; then
for secret in $production_secrets; do
rotate_secret "$secret"
done
fi
# 5. Audit logging
echo "$(date -Iseconds) - Offboarded: $employee_name" >> offboarding_audit.log
Security considerations:
- Immediate revocation: Access removed instantly
- Secret rotation: Option to rotate compromised secrets
- Audit trail: Complete record of who was removed when
- No cleanup required: Employee can’t decrypt even old Git history
Secret Rotation Strategy
Our rotation script (scripts/rotate-secrets.sh
) provides both automated and manual rotation:
# Automated rotation example
rotate_database_secret() {
local secret_file="$1"
local temp_file="/tmp/sops_rotate_$$.yaml"
# Decrypt current secret
sops -d "$secret_file" > "$temp_file"
# Generate new credentials
new_password=$(generate_password 32)
yq eval ".data.password = \"$new_password\"" -i "$temp_file"
# Add rotation metadata
echo "# Rotated: $(date -Iseconds)" >> "$temp_file"
# Re-encrypt
sops -e "$temp_file" > "$secret_file"
# Log rotation
echo "$(date) | Rotated $secret_file" >> rotation.log
}
Rotation triggers:
- Scheduled: Quarterly rotation for compliance
- Event-based: After employee offboarding
- On-demand: Security incidents or suspected compromise
CI/CD Integration
GitHub Actions Setup
Our GitHub Actions integration provides three critical workflows:
1. Pull Request Validation
# .github/workflows/validate-secrets.yml
name: Validate Secrets
on:
pull_request:
paths:
- 'secrets/**/*.enc.yaml'
- '.sops.yaml'
jobs:
validate:
steps:
- name: Check all secrets are encrypted
run: |
for file in $(find secrets -name "*.enc.yaml"); do
if ! grep -q "sops:" "$file"; then
echo "ERROR: $file is not encrypted!"
exit 1
fi
done
- name: Validate key references
run: |
# Ensure all referenced keys exist
# Check for orphaned keys
# Validate access patterns
2. Secret Deployment
# .github/workflows/deploy-secrets.yml
name: Deploy Secrets
on:
workflow_dispatch:
inputs:
environment:
type: choice
options: [development, staging, production]
target:
type: choice
options: [kubernetes, aws-secrets-manager, azure-keyvault]
jobs:
deploy:
environment: ${{ inputs.environment }} # Requires approval
steps:
- name: Setup decryption
env:
SOPS_AGE_KEY: ${{ secrets.SOPS_AGE_KEY }}
run: |
# Decrypt and deploy to target platform
Key features:
- Environment protection: Production requires approval
- Multi-platform support: Deploy anywhere
- Audit trail: All deployments logged
3. Weekly Audit
# .github/workflows/audit-keys.yml
name: Audit Keys
on:
schedule:
- cron: '0 9 * * 1' # Weekly on Mondays
jobs:
audit:
steps:
- name: Analyze key usage
run: |
# Find unused keys
# Check access patterns
# Validate security posture
- name: Create issue if problems found
if: contains(findings, 'problem')
uses: actions/github-script@v7
with:
script: |
github.rest.issues.create({
title: '⚠️ Security Audit Findings',
body: auditReport
})
Setting Up the CI/CD Key
The GitHub Actions setup requires a dedicated service account:
# 1. Generate CI/CD keypair
$ age-keygen -o ci-key.txt
Public key: age1wqer098upgs5y5xgm8qgve0dg86j8gzmupqh9lw5w5hhkqwqcpkq2djzk5
# 2. Add public key to .sops.yaml under 'ci' group
# 3. Add private key to GitHub Secrets as SOPS_AGE_KEY
# 4. Configure environment protection rules
Important: The CI/CD key should only decrypt, never encrypt new secrets. This prevents automated systems from adding unauthorized secrets.
Operational Excellence
Testing and Validation
The repository includes comprehensive testing capabilities:
- full-tests.sh: Automated test suite for all scripts with visual feedback
- verify-access.sh: Validates user access permissions
- Scripts CLI mode: All scripts support
--non-interactive
for automation
Monitoring and Alerting
Key metrics to track:
- Key age: Alert when keys exceed 90 days
- Access attempts: Failed decryption attempts
- Rotation compliance: Secrets not rotated on schedule
- Orphaned keys: Keys in config but not used
Disaster Recovery
Prepare for these scenarios:
Lost Private Key
# Prevention: Backup keys in secure password manager
# Recovery: Administrator re-encrypts for new key
sops updatekeys -y secrets/dev/*.enc.yaml
Compromised Key
# Immediate response (now with CLI support)
./scripts/offboard.sh --name compromised_employee --non-interactive
./scripts/offboard.sh --name compromised_employee --rotate-secrets --non-interactive
# Follow-up
- Audit access logs
- Review git history for exposed secrets
- Update security policies
Corrupted Secret File
# Recovery from Git history
git log --follow secrets/production/database.enc.yaml
git checkout <last-known-good> -- secrets/production/database.enc.yaml
Performance Considerations
SOPS performance tips for large deployments:
- Batch operations: Re-encrypt multiple files in parallel
find secrets -name "*.enc.yaml" | \
parallel -j 4 'sops updatekeys -y {}'
- Selective encryption: Only encrypt sensitive values
# .sops.yaml
# This regex tells SOPS to only encrypt specific fields
encrypted_regex: '^(data|stringData|password|apiKey|token|secret|key|credential)$'
Use environment variables: Set
SOPS_AGE_KEY_FILE
orSOPS_AGE_KEY
to avoid repeated file readsOptimize script execution: All scripts support non-interactive mode for faster batch operations
Security Best Practices
Defense in Depth
Layer your security controls:
- Repository level: Branch protection, required reviews
- SOPS level: Encryption, access control
- CI/CD level: Environment protection, audit logs
- Runtime level: Least privilege, secret rotation
- Monitoring level: Alerting, anomaly detection
Common Pitfalls to Avoid
1. Committing Plaintext Secrets
# Bad: Forgetting to encrypt
$ echo "password: secret" > secrets/dev/config.yaml
$ git add . && git commit # EXPOSED!
# Good: Always encrypt before committing
$ sops -e config.yaml > config.enc.yaml
$ rm config.yaml # Remove plaintext
$ git add config.enc.yaml
2. Weak Access Control
# Bad: Everyone has production access
creation_rules:
- path_regex: .*
key_groups:
- age: [*developers, *administrators]
# Good: Environment-based restrictions
creation_rules:
- path_regex: secrets/production/.*
key_groups:
- age: [*administrators] # Only admins
3. No Key Rotation
# Bad: Using the same keys for years
# Keys never expire with age, but rotation is still important
# Good: Quarterly rotation schedule (crontab entry)
0 0 1 */3 * /path/to/scripts/rotate-secrets.sh --all --notify
Compliance Considerations
For regulated industries, ensure:
- Audit logging: All access and changes logged
- Separation of duties: Developers can’t access production
- Key rotation: Regular rotation schedule
- Access reviews: Quarterly access audits
- Data residency: Keys and secrets in approved regions
Production Insights
After implementing this system across multiple organizations, here are key insights from real-world deployments:
What Works Well
- YAML anchors significantly reduce configuration errors and maintenance overhead
- Automated scripts ensure consistent processes and reduce human error
- Git-based workflow provides natural audit trail with zero additional infrastructure
- Role separation prevents unauthorized access while maintaining usability
- CI/CD integration enables secure deployments without manual intervention
- Non-interactive mode allows full automation and integration with other tools
- Cross-platform compatibility works seamlessly on macOS, Linux, and WSL
What To Watch For
- Key sprawl: Regular audits prevent accumulation of unused keys - use
scripts/list-keys.sh
- Onboarding delays: Have backup administrators across time zones for 24/7 coverage
- Rotation coordination: Communicate rotation schedules to teams before executing
- Backup strategies: Ensure multiple admins have recovery keys stored securely
- Training needs: Developers need education on the new workflow - provide runbooks
- Git conflicts: Multiple simultaneous onboardings can cause merge conflicts in
.sops.yaml
- Performance impact: Re-encrypting many files can be slow - use parallel processing
Team Adoption Strategies
Successfully rolling out this system requires:
- Start small: Begin with development environment
- Document everything: Clear runbooks and troubleshooting guides
- Provide tooling: Scripts support both interactive and CLI modes for automation
- Training sessions: Hands-on workshops for teams
- Champion program: Identify power users in each team
- Gradual migration: Don’t force immediate adoption
The included scripts all support non-interactive mode for CI/CD integration:
# Examples of non-interactive usage
./scripts/onboard.sh --name alice --role developer --key age1... --non-interactive
./scripts/offboard.sh --name bob --rotate-secrets --non-interactive
./scripts/verify-access.sh --non-interactive --json
Advanced Topics
Script Automation Examples
All scripts support non-interactive mode for CI/CD and automation:
# Batch onboarding from CSV
while IFS=, read -r name role pubkey; do
./scripts/onboard.sh --name "$name" --role "$role" --key "$pubkey" --non-interactive
done < employees.csv
# Automated offboarding with Slack notification
./scripts/offboard.sh --name "$employee" --rotate-secrets --non-interactive
curl -X POST -H 'Content-type: application/json' \
--data "{\"text\":\"Offboarded $employee and rotated secrets\"}" \
"$SLACK_WEBHOOK_URL"
# JSON output for monitoring dashboards
./scripts/verify-access.sh --non-interactive --json | jq '.accessible_environments'
Multi-Region Deployments
For global deployments, consider:
# Regional key management
creation_rules:
- path_regex: secrets/us-east-1/.*
kms: arn:aws:kms:us-east-1:xxx:key/xxx
- path_regex: secrets/eu-west-1/.*
kms: arn:aws:kms:eu-west-1:xxx:key/xxx
Integration with Kubernetes
Deploy secrets directly to Kubernetes:
# Decrypt and create Kubernetes secret
sops -d secrets/production/database.enc.yaml | \
kubectl create secret generic db-credentials \
--from-file=config=/dev/stdin \
--namespace=production
Or use tools like Sealed Secrets or External Secrets Operator for GitOps workflows.
Terraform Integration
Use SOPS with Terraform for infrastructure secrets:
# terraform/main.tf
data "sops_file" "secrets" {
source_file = "secrets/terraform/aws.enc.yaml"
}
resource "aws_db_instance" "database" {
master_password = data.sops_file.secrets.data["database_password"]
}
Conclusion
Building a robust secrets management system doesn’t require complex infrastructure or expensive solutions. With SOPS, age encryption, and well-designed processes, you can create a system that is:
- Secure: End-to-end encryption with role-based access control
- Auditable: Complete Git-based audit trail of all changes
- Scalable: From startups to enterprises (tested with 100+ secrets)
- Developer-friendly: Works with existing Git workflows
- Cost-effective: Zero infrastructure cost, no servers to maintain
- Cross-platform: Runs on macOS, Linux, and Windows (WSL)
- Automation-ready: Full CLI support for CI/CD integration
The implementation we’ve walked through provides a production-ready foundation that you can adapt to your organization’s specific needs. All scripts have been tested in production environments and support both interactive use for humans and non-interactive mode for automation.
Start with the basics, add automation gradually, and continuously improve based on your team’s feedback. The included test suite (full-tests.sh
) makes it safe to experiment and learn the complete workflow.
Remember: the best secrets management system is one that your team will actually use. Make it simple, make it secure, and make it part of the natural workflow.
Pro tip: Start by running ./full-tests.sh
to see the complete system in action and validate your setup. The test script is non-destructive and will help you understand the workflow before deploying to production. It demonstrates all key features including onboarding, access control, and offboarding.
Resources and Next Steps
Get Started
- Clone the repository: github.com/cgoolsby/sops-for-companies
- Install prerequisites:
# macOS brew install sops age # Linux # Download SOPS from https://github.com/getsops/sops/releases # Install age: go install filippo.io/age/cmd/...@latest
- Generate your first keypair:
age-keygen -o key.txt # Save the output securely!
- Encrypt your first secret:
echo "password: supersecret" > secret.yaml sops -e secret.yaml > secret.enc.yaml rm secret.yaml # Never leave plaintext!
- Run the test suite to see everything in action:
./full-tests.sh # This demonstrates the complete lifecycle: # - Onboarding developers and administrators # - Verifying role-based access control # - Offboarding with secret rotation
Additional Resources
Security Checklist
- Never commit private keys - Use .gitignore patterns
- Rotate keys quarterly - Schedule with
scripts/rotate-secrets.sh
- Audit access regularly - Weekly GitHub Actions workflow included
- Test disaster recovery - Practice key recovery procedures
- Monitor for anomalies - Track failed decryption attempts
- Use strong passphrases - Protect private keys with password managers
- Implement MFA - Require for production deployments
- Regular backups - Keep encrypted backups of critical keys
Need Help?
This implementation has been battle-tested in production environments. If you encounter issues:
- Check the scripts - All scripts support
--help
for usage information - Run the test suite -
./full-tests.sh
validates your setup - Review audit logs - Check
offboarding_audit.log
for history - Open an issue - Report bugs or suggest improvements in the repository
Common troubleshooting:
- “mapfile: command not found” - Scripts are compatible with bash 3.2+ (macOS)
- “cannot decrypt” - Verify your key is in
.sops.yaml
and secrets were re-encrypted - “permission denied” - Ensure scripts are executable:
chmod +x scripts/*.sh
This implementation has been tested with teams ranging from 5 to 500+ developers. If you found this helpful, please share it with your team. Secure secrets management is everyone’s responsibility, and the more organizations that implement proper controls, the safer we all become.
All code in this post is from a working implementation available in the accompanying repository. The scripts have been tested on macOS (10.15+), Ubuntu (20.04+), and Windows WSL2.
Tags: #DevSecOps #SecretsManagement #SOPS #Security #GitOps #InfrastructureAsCode