CI/CD Pipeline Security: Enterprise Defense in Depth
At scale, CI/CD systems become critical infrastructure with hundreds or thousands of pipelines deploying to production daily. A compromise doesn’t just mean one bad deployment - it means an attacker with persistent access to your entire deployment mechanism.
This deep-water content addresses the hardest problems: hermetic builds, supply chain attack analysis, policy enforcement, zero-trust architectures, and incident response for sophisticated threats.
SLSA Level 4: Hermetic and Reproducible Builds
SLSA Level 4 is genuinely difficult. It requires hermetic builds where two builds from the same source code produce bit-for-bit identical artifacts. This matters for:
- Verifying build integrity - If your binary doesn’t match the reference build, something’s wrong
- Forensics - Reproduce the exact build to investigate compromises
- Compliance - Some regulated industries require reproducible builds
What Makes a Build Hermetic?
A hermetic build has no external inputs except:
- Source code (at a specific commit)
- Explicitly declared dependencies (pinned versions)
- Build toolchain (compiler, runtime)
Everything else is eliminated:
- No network access during build
- No access to filesystem outside build directory
- No ambient environment variables
- No timestamps or build machine hostnames
- No randomness in output
The Challenge: Timestamps and Non-Determinism
Many tools embed timestamps, file paths, or random data in build artifacts:
# Two builds of the same code produce different binaries:
$ make build
$ sha256sum ./output
abc123...
$ make build
$ sha256sum ./output
def456... # Different!
Common sources of non-determinism:
- Timestamps - Embedded
__DATE__macros in C/C++, modification times in archives - Build paths - Absolute paths baked into debug symbols
- Randomness - ASLR offsets, random padding
- Ordering - Non-deterministic hash table iteration, parallel build ordering
- Environment - Username, hostname, timezone affecting build
Making Builds Reproducible
Strip timestamps:
# Instead of:
COPY . /app
# Use fixed timestamp:
COPY --chmod=0755 --chown=1000:1000 . /app
RUN find /app -exec touch -t 202301010000.00 {} +
Fix environment:
# Bazel build (Google's hermetic build system)
build --action_env=TZ=UTC
build --action_env=SOURCE_DATE_EPOCH=1672531200
build --incompatible_strict_action_env
Pin everything:
# Not reproducible - "latest" changes
FROM node:18
# Reproducible - specific digest
FROM node:18.17.1@sha256:a6385a6bb2fdcb7c48fc871e35e32af8daaa82c518f508a5f2424f988d60c6a9
# Pin build tools too
RUN apt-get update && \
apt-get install -y --no-install-recommends \
make=4.3-4.1 \
git=1:2.30.2-1
Disable network during build:
# Dockerfile build with network isolation
RUN --network=none make build
With Docker BuildKit:
docker build --network=none -t myapp .
Verification
After implementing reproducibility:
# Build 1
docker build -t myapp:build1 .
docker save myapp:build1 > build1.tar
sha256sum build1.tar
# Build 2 (different machine, different time)
docker build -t myapp:build2 .
docker save myapp:build2 > build2.tar
sha256sum build2.tar
# Hashes should match exactly
Google’s Approach: Bazel
Bazel is designed for hermetic builds from the start:
# BUILD.bazel
go_binary(
name = "myapp",
srcs = ["main.go"],
deps = [
"@com_github_gin_gonic_gin//:gin", # Pinned external dependency
],
# Bazel ensures build is hermetic
)
Bazel advantages:
- Sandboxed execution (build can’t access network or filesystem)
- Content-based caching (rebuilds only what changed)
- Remote execution (distribute build across machines)
Bazel disadvantages:
- Steep learning curve
- Requires rewriting build files
- Ecosystem support varies by language
Is SLSA 4 Worth It?
When to invest:
- Highly regulated industries (finance, healthcare, defense)
- Open-source projects needing verifiable builds
- Critical infrastructure (OS distributions, security tools)
- Organizations with history of supply chain incidents
When SLSA 3 is enough:
- Most SaaS companies
- Internal tools
- Fast-moving startups
- Projects where slight non-determinism is acceptable
Google achieves SLSA 4 for most internal builds, but they have hundreds of engineers maintaining build infrastructure. For most organizations, SLSA 3 (isolated builds with signed provenance) provides the right security/complexity trade-off.
Supply Chain Attack Deep Dive
Understanding actual attack techniques helps design defenses.
Attack Vector 1: Malicious Dependencies
Sophisticated variant - Gradual compromise:
- Attacker publishes legitimate package to npm
- Builds reputation over months (stars, downloads, community)
- Version 1.0-1.5: Completely benign
- Version 1.6: Adds innocent-looking code with subtle backdoor
- Version 1.7: Backdoor activates only in production environments
- Version 1.8: Backdoor connects to C2 server only after 30 days
Detection is hard because:
- Code review sees small incremental changes
- Automated scanning doesn’t catch targeted logic
- Trigger conditions avoid sandbox detection
Defense:
# Policy: Require security review for new dependencies
# .github/workflows/dependency-check.yml
name: Dependency Review
on: [pull_request]
jobs:
review:
runs-on: ubuntu-latest
steps:
- name: Check for new dependencies
uses: actions/dependency-review-action@v3
with:
fail-on-severity: moderate
# Custom policy: Any new dependency triggers manual review
deny-licenses: GPL-3.0, AGPL-3.0
Behavioral analysis:
# Example: Detect suspicious behavior in dependencies
import subprocess
import sys
def check_dependency_behavior(package_name):
"""
Run package installation in instrumented environment
to detect suspicious behavior
"""
suspicious_behaviors = [
"network connections to unknown IPs",
"filesystem access outside package directory",
"spawning child processes",
"environment variable enumeration",
"crypto mining signatures"
]
# Use sandboxed environment (Firejail, Docker, etc.)
result = subprocess.run(
["firejail", "--net=none", "pip", "install", package_name],
capture_output=True
)
# Analyze strace logs, network attempts, filesystem access
# Flag anything suspicious for manual review
Attack Vector 2: Repository Compromise
Real example: CodeCov (2021)
Attackers modified CodeCov’s Bash uploader script:
# Original (legitimate)
curl -s https://codecov.io/bash | bash
# Attackers modified the script on CodeCov's servers
# Script exfiltrated environment variables (including CI secrets)
# Sent to attacker-controlled servers
29,000 customers potentially compromised because they piped remote scripts to bash.
Defense - Verified downloads:
# Never do this:
curl https://example.com/install.sh | bash
# Instead:
curl -O https://example.com/install.sh
curl -O https://example.com/install.sh.sha256
# Verify checksum
sha256sum -c install.sh.sha256
# Inspect script before running
less install.sh
# Run only after verification
bash install.sh
Better: Use package managers
# Instead of downloading scripts:
- name: Install tool
run: |
wget https://example.com/tool.tar.gz
tar xzf tool.tar.gz
# Use versioned package:
- uses: hashicorp/setup-terraform@v2
with:
terraform_version: 1.5.7 # Specific version
Attack Vector 3: Build Environment Persistence
Scenario:
- Attacker compromises shared build server
- Plants malicious script in
/usr/local/bin/ - Script modifies binaries during compilation
- All subsequent builds are compromised
- Attack persists even after attacker access is removed
Example poisoned build:
# Attacker's malicious /usr/local/bin/gcc wrapper
#!/bin/bash
# Call real gcc
/usr/bin/gcc "$@"
# If building auth-related code, inject backdoor
if [[ "$@" == *"auth"* ]]; then
# Inject malicious code into binary
/usr/local/bin/inject-backdoor $output_file
fi
Defense - Immutable build infrastructure:
# Don't use long-lived build servers
# Use ephemeral environments
# GitHub Actions (isolated by default)
runs-on: ubuntu-latest # Fresh VM every time
# Self-hosted runners - use VM snapshots
runs-on: self-hosted
# Revert to clean snapshot after each build
For self-hosted infrastructure:
# Terraform: Immutable build workers
resource "aws_instance" "build_worker" {
ami = var.builder_ami # Golden image
instance_type = "c5.xlarge"
# Instance terminates after use
instance_initiated_shutdown_behavior = "terminate"
# Tag for lifecycle management
tags = {
Purpose = "ephemeral-build-worker"
TTL = "2h" # Auto-terminate after 2 hours
}
}
Attack Vector 4: Dependency Confusion
Advanced variant - Targeting scoped packages:
// package.json
{
"dependencies": {
"@mycompany/core-utils": "^1.0.0"
}
}
If you forget to configure registry for @mycompany scope:
# Checks private registry - not found
# Falls back to public npm
# Attacker has published @mycompany/core-utils to public npm
# Malicious package gets installed
npm install
Comprehensive defense:
# .npmrc - Lock down all scopes
@mycompany:registry=https://npm.pkg.github.com/
@mycompany:_authToken=${GH_TOKEN}
# Block access to public registry for scoped packages
registry=https://registry.npmjs.org/
@mycompany:registry=https://npm.pkg.github.com/
# Verify packages match expected signatures
strict-ssl=true
Organization-level protection:
# Registry-side defense: Reserve namespace
# Contact npm Enterprise, GitHub Packages, or Artifactory
# Request namespace reservation for @mycompany
# Only your organization can publish packages under that scope
Policy Enforcement with OPA and Kyverno
At enterprise scale, you can’t manually review every deployment. Policy-as-code enforces security requirements automatically.
Open Policy Agent (OPA)
OPA is a general-purpose policy engine. You write policies in Rego (a declarative language), and OPA evaluates them.
Use case: Verify container images are signed
# policy.rego
package kubernetes.admission
import future.keywords.if
deny[msg] if {
input.request.kind.kind == "Pod"
image := input.request.object.spec.containers[_].image
# Check if image signature exists in Rekor
not image_is_signed(image)
msg := sprintf("Image %v is not signed", [image])
}
image_is_signed(image) if {
# Query Rekor transparency log
# Verify signature exists for this image digest
# Implementation depends on your signing infrastructure
signature := http.send({
"method": "GET",
"url": sprintf("https://rekor.sigstore.dev/api/v1/log/entries?logIndex=%v", [image])
})
signature.status_code == 200
}
Integration with Kubernetes:
# Install OPA Gatekeeper
kubectl apply -f https://raw.githubusercontent.com/open-policy-agent/gatekeeper/master/deploy/gatekeeper.yaml
# Apply constraint template
apiVersion: templates.gatekeeper.sh/v1
kind: ConstraintTemplate
metadata:
name: requiresignedimages
spec:
crd:
spec:
names:
kind: RequireSignedImages
targets:
- target: admission.k8s.gatekeeper.sh
rego: |
package requiresignedimages
violation[{"msg": msg}] {
container := input.review.object.spec.containers[_]
not is_signed(container.image)
msg := sprintf("Container image %v is not signed", [container.image])
}
is_signed(image) {
# Call out to Cosign verification
# Or check internal database of signed images
}
Result: Unsigned images can’t deploy, even if someone bypasses code review.
Kyverno (Kubernetes-Native Alternative)
Kyverno uses YAML for policies instead of Rego, making it more accessible.
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: verify-image-signatures
spec:
validationFailureAction: enforce
background: false
webhookTimeoutSeconds: 30
failurePolicy: Fail
rules:
- name: verify-signature
match:
any:
- resources:
kinds:
- Pod
verifyImages:
- imageReferences:
- "myregistry.io/*"
attestors:
- count: 1
entries:
- keyless:
subject: "https://github.com/myorg/myrepo/.github/workflows/*"
issuer: "https://token.actions.githubusercontent.com"
rekor:
url: https://rekor.sigstore.dev
This policy verifies every image from myregistry.io was signed by your GitHub Actions workflows.
Policy Use Cases
1. Enforce SLSA provenance:
# Require all production deployments have SLSA attestation
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: require-slsa-attestation
spec:
rules:
- name: check-attestation
match:
resources:
kinds:
- Deployment
namespaces:
- production
verifyImages:
- attestations:
- predicateType: https://slsa.dev/provenance/v0.2
conditions:
- all:
- key: "{{ builder.id }}"
operator: In
value: ["https://github.com/myorg/myrepo/.github/workflows/build.yml@refs/heads/main"]
2. Block images with known vulnerabilities:
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: block-vulnerable-images
spec:
rules:
- name: check-vulnerabilities
match:
resources:
kinds:
- Pod
preconditions:
all:
- key: "{{ request.operation }}"
operator: In
value: [CREATE, UPDATE]
validate:
message: "Image has critical vulnerabilities"
foreach:
- list: "request.object.spec.containers"
deny:
conditions:
any:
- key: "{{ scan_result(element.image).critical_vulns }}"
operator: GreaterThan
value: 0
3. Enforce least privilege:
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: require-non-root
spec:
rules:
- name: check-containers-not-root
match:
resources:
kinds:
- Pod
validate:
message: "Containers must not run as root"
pattern:
spec:
containers:
- securityContext:
runAsNonRoot: true
Zero-Trust CI/CD Architecture
Traditional CI/CD assumes trust within the build network. Zero-trust assumes everything is potentially compromised.
Principles
- Verify explicitly - Always authenticate and authorize, never assume trust
- Least privilege - Minimal access needed for each job
- Assume breach - Design for compromise containment
Reference Architecture
┌─────────────────────────────────────────────────────────────┐
│ Developer Workstation │
│ │
│ 1. git push (signed commit) │
└────────────────────┬────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Source Repository (GitHub/GitLab) │
│ │
│ 2. Webhook triggers pipeline │
│ 3. OIDC token issued to workflow │
└────────────────────┬────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Isolated Build Environment (Ephemeral VM) │
│ │
│ 4. Verify commit signature │
│ 5. Fetch code (read-only) │
│ 6. Build in sandbox (no network) │
│ 7. Generate SLSA provenance │
│ 8. Sign artifact with OIDC identity │
└────────────────────┬────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Artifact Storage (OCI Registry) │
│ │
│ 9. Store signed artifact + provenance │
│ 10. Transparency log entry (Rekor) │
└────────────────────┬────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Policy Gate (OPA/Kyverno) │
│ │
│ 11. Verify signature │
│ 12. Check SLSA provenance │
│ 13. Validate vulnerability scan results │
│ 14. Enforce deployment policies │
└────────────────────┬────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Production Environment │
│ │
│ 15. Pull verified artifact │
│ 16. Deploy with workload identity │
│ 17. Runtime attestation verification │
└─────────────────────────────────────────────────────────────┘
Key Security Controls
Identity-based access:
# No long-lived credentials anywhere
# Build uses OIDC:
- AWS IAM role trusts GitHub OIDC provider
- Role policy specifies: "only main branch from specific repo"
- Temporary credentials issued for job duration
- Credentials expire after 1 hour
# Deployment uses workload identity:
- Kubernetes ServiceAccount mapped to cloud IAM role
- Pod can only access resources its role permits
- No secrets stored in cluster
Network segmentation:
# Build network cannot reach production
# Deployment network is separate
# Strict firewall rules between segments
# Example: AWS Security Groups
resource "aws_security_group" "build_network" {
egress {
# Build can access artifact registry
to_port = 443
protocol = "tcp"
cidr_blocks = [var.registry_cidr]
}
# No ingress allowed
# No other egress allowed
}
resource "aws_security_group" "production_network" {
# Production cannot be reached from build network
ingress {
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = [var.user_traffic_cidr]
}
}
Monitoring and alerting:
# Detect unusual activity
alerts:
- name: UnusualBuildBehavior
query: |
# Alert if build job makes unexpected network calls
network_connections{job="build"} and not (
destination in ["registry.io", "github.com"]
)
- name: AnomalousDeployment
query: |
# Alert if deployment happens outside business hours
deployment_created{hour < 8 or hour > 18}
- name: PrivilegeEscalation
query: |
# Alert if pipeline role assumes higher privileges
iam_assume_role{target_role=~".*admin.*"}
Incident Response Playbook
Scenario 1: Compromised CI/CD Secrets
Detection triggers:
- Cloud provider alerts about API calls from unusual IPs
- Unexpected deployments to production
- Third-party services report unauthorized access
- Security scanning detects malicious code in recent build
Response procedure:
Phase 1: Contain (First 30 minutes)
# 1. Immediately revoke compromised credentials
aws iam delete-access-key --access-key-id AKIA...
# 2. Suspend all pipeline executions
gh api -X POST /repos/OWNER/REPO/actions/runs/RUN_ID/cancel
# 3. Block network access from build infrastructure
aws ec2 revoke-security-group-ingress --group-id sg-... --cidr 0.0.0.0/0
# 4. Enable enhanced logging
aws cloudtrail create-trail --name incident-response-trail --s3-bucket-name forensics-bucket
Phase 2: Investigate (Hours 1-4)
# Audit all deployments in compromise window
gh api /repos/OWNER/REPO/actions/runs \
--jq '.workflow_runs[] | select(.created_at > "2024-01-01T00:00:00Z") | {id, name, head_sha, created_at}'
# Check what credentials accessed
aws cloudtrail lookup-events \
--lookup-attributes AttributeKey=AccessKeyId,AttributeValue=AKIA... \
--max-items 1000 \
--output json > cloudtrail-forensics.json
# Analyze API calls
cat cloudtrail-forensics.json | jq '.Events[] | {time: .EventTime, event: .EventName, resource: .Resources, ip: .SourceIPAddress}'
# Check for data exfiltration
aws s3api list-objects --bucket production-data --query 'Contents[?LastModified>`2024-01-01`]'
Phase 3: Eradicate (Hours 4-8)
# Rotate all secrets (not just compromised one)
# Assumption: If one leaked, others might have too
# Generate new credentials
aws iam create-access-key --user-name ci-deployment-user
# Update CI platform with new secrets
gh secret set AWS_ACCESS_KEY_ID --body "$NEW_KEY_ID"
gh secret set AWS_SECRET_ACCESS_KEY --body "$NEW_SECRET"
# Delete old credentials
aws iam delete-access-key --access-key-id AKIA_OLD...
# Review and potentially rebuild recent deployments
for sha in $(git log --since="2024-01-01" --format=%H); do
# Verify commit signature
git verify-commit $sha
# Check if build artifacts are signed
cosign verify myregistry.io/myapp:$sha
# If verification fails, rebuild from source
./rebuild.sh $sha
done
Phase 4: Recovery (Hours 8-24)
# Deploy known-good version
# From before compromise started
git checkout KNOWN_GOOD_SHA
./deploy.sh production
# Restore monitoring and automation
# With enhanced security controls
# Implement lessons learned:
# - Move to workload identity (no stored secrets)
# - Add policy gates (OPA/Kyverno)
# - Enable artifact signing
# - Increase audit logging
Phase 5: Post-Incident (Week 1-2)
# Incident Report Template
## Summary
- **Date**: 2024-01-15
- **Duration**: 8 hours (detection to full recovery)
- **Impact**: Unauthorized access to production AWS account
- **Root cause**: CI/CD secret exposed in repository history
## Timeline
- 00:00: Attacker discovers AWS credentials in old commit
- 02:30: Unusual API calls detected by CloudWatch
- 03:00: Security team alerted
- 03:15: Credentials revoked
- 04:00: Investigation begins
- 08:00: All secrets rotated, systems recovered
- 12:00: Enhanced monitoring deployed
## What went well
- Detection within 2.5 hours
- Rapid credential revocation
- No customer data accessed
## What went poorly
- Secret was in Git history for 6 months before detection
- Manual secret rotation took 4 hours (should be automated)
- No policy preventing secrets in code
## Action items
- [x] Implement pre-commit hooks to block secrets
- [x] Enable GitHub secret scanning
- [x] Migrate to workload identity (no long-lived credentials)
- [x] Automate secret rotation
- [ ] Conduct tabletop exercise for next incident
Scenario 2: Malicious Code in Production
Detection:
- Security team reports backdoor in production deployment
- Unexpected network traffic from production to unknown IPs
- Customer reports suspicious behavior
Immediate response:
# 1. Identify malicious deployment
kubectl get deployments -n production -o json | \
jq '.items[] | {name, image, created: .metadata.creationTimestamp}'
# 2. Rollback to last known-good version
kubectl rollout undo deployment/myapp -n production
# 3. Block malicious image
cat <<EOF | kubectl apply -f -
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: block-compromised-image
spec:
validationFailureAction: enforce
rules:
- name: block-bad-image
match:
resources:
kinds:
- Pod
validate:
message: "Compromised image blocked"
deny:
conditions:
- key: "{{ images.*.digest }}"
operator: Equals
value: "sha256:MALICIOUS_DIGEST"
EOF
# 4. Quarantine affected nodes
kubectl cordon node-1 node-2 node-3
kubectl drain node-1 --ignore-daemonsets --delete-emptydir-data
Forensic analysis:
# Extract malicious image for analysis
docker pull myregistry.io/myapp@sha256:MALICIOUS_DIGEST
docker save myregistry.io/myapp@sha256:MALICIOUS_DIGEST > malicious-image.tar
# Analyze image layers
docker history myregistry.io/myapp@sha256:MALICIOUS_DIGEST
# Extract filesystem
mkdir image-contents
tar -xf malicious-image.tar -C image-contents
# Search for suspicious files
find image-contents -name "*.sh" -o -name "*.elf" | xargs strings | grep -i "backdoor\|shell\|payload"
# Check network connections if container ran
# Get pod that ran the malicious image
kubectl logs -n production POD_NAME --previous | grep -E "connect|socket|bind"
Determine scope:
# How did malicious code get into pipeline?
# Check build logs
gh api /repos/OWNER/REPO/actions/runs/RUN_ID/logs
# Review recent pipeline changes
git log --since="1 month ago" -- .github/workflows/
# Check for compromised dependencies
npm audit --json > audit.json
cat audit.json | jq '.vulnerabilities | to_entries[] | select(.value.severity=="critical")'
Economic Analysis of CI/CD Security
Cost of Prevention
Baseline security (SLSA 1-2):
- Engineering time: 40 hours initial setup
- Tooling: $0-500/month (GitHub Actions, open source tools)
- Maintenance: 10 hours/month
- Annual cost: ~$20,000 (engineering time) + $6,000 (tools) = $26,000
Advanced security (SLSA 3):
- Engineering time: 200 hours initial (workload identity, policy enforcement, signing)
- Tooling: $2,000/month (Sigstore infrastructure, policy engines, monitoring)
- Maintenance: 40 hours/month
- Annual cost: ~$100,000 (engineering time) + $24,000 (tools) = $124,000
Enterprise security (SLSA 4):
- Engineering time: 1,000 hours initial (hermetic builds, reproducibility)
- Dedicated security team: 2 FTEs
- Tooling: $10,000/month (enterprise tools, support contracts)
- Annual cost: ~$300,000 (team) + $120,000 (tools) = $420,000
Cost of Breach
CircleCI breach (2022-2023):
- Customer engineering time rotating secrets: 29,000 customers × 8 hours = 232,000 hours
- At $100/hour: $23,200,000 in customer labor
- CircleCI’s costs: Investigation, incident response, customer support, reputation damage
- Estimated total: $50-100M
SolarWinds breach (2020):
- Remediation costs: >$100M
- Legal fees, settlements: Unknown (ongoing)
- Reputation damage: Stock price dropped 25%
- Customer trust: Permanent impact
CodeCov breach (2021):
- 29,000 potentially compromised customers
- Estimated customer cost to rotate credentials: $10-50M aggregate
- CodeCov’s reputation: Significant damage
Break-Even Analysis
If your organization has:
- 100 engineers ($100/hour)
- 20 services in production
- SaaS business with $50M ARR
Cost of breach (conservative):
- Engineering time investigating: 100 engineers × 40 hours = $400,000
- Credential rotation: 200 hours across teams = $20,000
- Customer notification: 40 hours = $4,000
- Reputation damage: 10% customer churn = $5,000,000
- Total: ~$5.4M
Probability of breach over 5 years:
- Without CI/CD security: ~15% (based on industry data)
- With SLSA 2: ~5%
- With SLSA 3: ~1%
Expected value:
- No security: 0.15 × $5.4M = $810,000 expected loss
- SLSA 2: 0.05 × $5.4M = $270,000 expected loss + $26,000 annual cost = $296,000
- SLSA 3: 0.01 × $5.4M = $54,000 expected loss + $124,000 annual cost = $178,000
Result: SLSA 3 pays for itself if there’s even a 1% chance of breach over 5 years.
Implementation Roadmap for Enterprise Deployment
Months 1-3: Foundation
Goals:
- Eliminate secrets from code
- Enable dependency scanning
- Implement basic access controls
Tasks:
- Audit all repositories for committed secrets
- Implement secret scanning (GitHub Advanced Security, git-secrets)
- Move secrets to CI platform secret managers
- Enable Dependabot or equivalent
- Set up CODEOWNERS for pipeline files
- Pin third-party actions to commit hashes
- Implement branch protection rules
Success metrics:
- Zero secrets in new commits
- 100% of repositories have dependency scanning
- All workflow changes require review
Months 4-6: Workload Identity
Goals:
- Eliminate long-lived credentials
- Implement OIDC authentication
Tasks:
- Configure OIDC trust in AWS/GCP/Azure
- Create IAM roles for each workflow
- Migrate workflows to workload identity
- Test credential expiration and renewal
- Rotate and delete old long-lived credentials
- Document OIDC setup for new services
Success metrics:
- 80% of workflows using workload identity
- Zero high-privilege long-lived credentials
- Automated credential lifecycle
Months 7-9: Artifact Signing
Goals:
- Achieve SLSA Level 2
- Implement artifact signing
Tasks:
- Set up Sigstore infrastructure (Cosign, Rekor)
- Integrate signing into build pipelines
- Generate SLSA provenance for builds
- Implement verification in deployment
- Create policy requiring signatures for production
- Train teams on verification
Success metrics:
- 100% of production deployments signed
- Provenance available for all artifacts
- Automated verification before deploy
Months 10-12: Policy Enforcement
Goals:
- Implement admission control
- Enforce security policies
Tasks:
- Deploy OPA Gatekeeper or Kyverno
- Create policies for image signing
- Implement vulnerability scanning gates
- Enforce SLSA provenance requirements
- Set up policy violation alerting
- Document policy exceptions process
Success metrics:
- Unsigned images cannot deploy
- High/critical vulnerabilities blocked
- Policy violations < 5/month
Months 13-18: Advanced Hardening
Goals:
- Achieve SLSA Level 3
- Implement hermetic builds
Tasks:
- Migrate to isolated build environments
- Implement reproducible builds
- Set up build caching and distribution
- Network isolation for builds
- Immutable build infrastructure
- Continuous compliance monitoring
Success metrics:
- Build reproducibility: 95%
- Build isolation: 100%
- Zero build contamination incidents
Ongoing: Monitoring and Improvement
Continuous tasks:
- Weekly security scanning review
- Monthly policy effectiveness review
- Quarterly threat modeling updates
- Annual penetration testing
- Ongoing team training
- Incident response drills
Lessons from Real Breaches
SolarWinds (2020): Build System Compromise
What happened:
- Attackers compromised SolarWinds’ build system
- Injected malicious code into Orion software
- Malware distributed via trusted software updates
- Affected 18,000+ customers including US government agencies
How it worked:
- Attackers gained access to build environment
- Modified source code before compilation
- Signed malicious binaries with SolarWinds’ legitimate certificate
- Updates distributed through normal channels
What would have prevented it:
- Hermetic builds - External code couldn’t have been injected
- Reproducible builds - Someone could have detected the mismatch
- Code signing with transparency logs - Anomalous signatures would have been visible
- Network isolation - Build environment couldn’t communicate with attacker C2
CodeCov (2021): Script Modification
What happened:
- Attacker modified CodeCov’s Bash uploader script
- Script ran in customer CI/CD pipelines
- Exfiltrated environment variables (secrets)
- 29,000 customers potentially affected
How it worked:
- Attacker gained access to CodeCov’s infrastructure
- Modified script served by codecov.io
- Customers ran
curl https://codecov.io/bash | bash - Script sent environment variables to attacker
What would have prevented it:
- Verified downloads - Checksum verification would have detected tampering
- Avoid piping to bash - Use package managers instead
- Secret scoping - Limited environment variables to necessary jobs
- Network monitoring - Unusual outbound connections would trigger alerts
CircleCI (2022-2023): Secret Exfiltration
What happened:
- Attacker compromised employee laptop
- Escalated to production systems
- Stole customer secrets from CircleCI’s secret storage
- Customers forced to rotate thousands of credentials
How it worked:
- Employee laptop compromised (likely malware)
- Attacker moved laterally through CircleCI network
- Accessed production secret storage
- Exfiltrated encrypted secrets and decryption keys
What would have prevented it:
- Workload identity - No secrets stored to steal
- Zero-trust networking - Compromised laptop couldn’t reach production
- Secret encryption - Separate key management from secret storage
- Access monitoring - Unusual access patterns would trigger alerts
- Just-in-time secrets - Temporary credentials limit damage window
Conclusion: Pragmatic Security Posture
CI/CD security is infrastructure security. The right level depends on your threat model:
For most teams (startups, internal tools, low-risk applications):
- Target: SLSA 2
- Store secrets in CI platform
- Pin dependencies
- Basic scanning
- Cost: ~$25,000/year
- Risk reduction: 80% of attacks prevented
For security-conscious teams (SaaS, regulated industries, customer data):
- Target: SLSA 3
- Workload identity
- Artifact signing
- Policy enforcement
- Isolated builds
- Cost: ~$125,000/year
- Risk reduction: 95% of attacks prevented
For critical infrastructure (finance, healthcare, defense, open source foundations):
- Target: SLSA 4
- Hermetic builds
- Reproducibility
- Advanced monitoring
- Dedicated security team
- Cost: ~$420,000/year
- Risk reduction: 99% of attacks prevented
The CircleCI breach showed that even major CI/CD platforms can be compromised. Defense in depth - combining technical controls with operational processes - provides the best protection.
Perfect security is impossible. Good-enough security prevents real-world attacks and limits damage when breaches occur. Start with basic controls, measure what matters, and invest in harder problems as your security maturity grows.