CI/CD Pipeline Security: Enterprise Defense in Depth

At scale, CI/CD systems become critical infrastructure with hundreds or thousands of pipelines deploying to production daily. A compromise doesn’t just mean one bad deployment - it means an attacker with persistent access to your entire deployment mechanism.

This deep-water content addresses the hardest problems: hermetic builds, supply chain attack analysis, policy enforcement, zero-trust architectures, and incident response for sophisticated threats.

SLSA Level 4: Hermetic and Reproducible Builds

SLSA Level 4 is genuinely difficult. It requires hermetic builds where two builds from the same source code produce bit-for-bit identical artifacts. This matters for:

  • Verifying build integrity - If your binary doesn’t match the reference build, something’s wrong
  • Forensics - Reproduce the exact build to investigate compromises
  • Compliance - Some regulated industries require reproducible builds

What Makes a Build Hermetic?

A hermetic build has no external inputs except:

  1. Source code (at a specific commit)
  2. Explicitly declared dependencies (pinned versions)
  3. Build toolchain (compiler, runtime)

Everything else is eliminated:

  • No network access during build
  • No access to filesystem outside build directory
  • No ambient environment variables
  • No timestamps or build machine hostnames
  • No randomness in output

The Challenge: Timestamps and Non-Determinism

Many tools embed timestamps, file paths, or random data in build artifacts:

# Two builds of the same code produce different binaries:
$ make build
$ sha256sum ./output
abc123...

$ make build
$ sha256sum ./output
def456...  # Different!

Common sources of non-determinism:

  • Timestamps - Embedded __DATE__ macros in C/C++, modification times in archives
  • Build paths - Absolute paths baked into debug symbols
  • Randomness - ASLR offsets, random padding
  • Ordering - Non-deterministic hash table iteration, parallel build ordering
  • Environment - Username, hostname, timezone affecting build

Making Builds Reproducible

Strip timestamps:

# Instead of:
COPY . /app

# Use fixed timestamp:
COPY --chmod=0755 --chown=1000:1000 . /app
RUN find /app -exec touch -t 202301010000.00 {} +

Fix environment:

# Bazel build (Google's hermetic build system)
build --action_env=TZ=UTC
build --action_env=SOURCE_DATE_EPOCH=1672531200
build --incompatible_strict_action_env

Pin everything:

# Not reproducible - "latest" changes
FROM node:18

# Reproducible - specific digest
FROM node:18.17.1@sha256:a6385a6bb2fdcb7c48fc871e35e32af8daaa82c518f508a5f2424f988d60c6a9

# Pin build tools too
RUN apt-get update && \
    apt-get install -y --no-install-recommends \
      make=4.3-4.1 \
      git=1:2.30.2-1

Disable network during build:

# Dockerfile build with network isolation
RUN --network=none make build

With Docker BuildKit:

docker build --network=none -t myapp .

Verification

After implementing reproducibility:

# Build 1
docker build -t myapp:build1 .
docker save myapp:build1 > build1.tar
sha256sum build1.tar

# Build 2 (different machine, different time)
docker build -t myapp:build2 .
docker save myapp:build2 > build2.tar
sha256sum build2.tar

# Hashes should match exactly

Google’s Approach: Bazel

Bazel is designed for hermetic builds from the start:

# BUILD.bazel
go_binary(
    name = "myapp",
    srcs = ["main.go"],
    deps = [
        "@com_github_gin_gonic_gin//:gin",  # Pinned external dependency
    ],
    # Bazel ensures build is hermetic
)

Bazel advantages:

  • Sandboxed execution (build can’t access network or filesystem)
  • Content-based caching (rebuilds only what changed)
  • Remote execution (distribute build across machines)

Bazel disadvantages:

  • Steep learning curve
  • Requires rewriting build files
  • Ecosystem support varies by language

Is SLSA 4 Worth It?

When to invest:

  • Highly regulated industries (finance, healthcare, defense)
  • Open-source projects needing verifiable builds
  • Critical infrastructure (OS distributions, security tools)
  • Organizations with history of supply chain incidents

When SLSA 3 is enough:

  • Most SaaS companies
  • Internal tools
  • Fast-moving startups
  • Projects where slight non-determinism is acceptable

Google achieves SLSA 4 for most internal builds, but they have hundreds of engineers maintaining build infrastructure. For most organizations, SLSA 3 (isolated builds with signed provenance) provides the right security/complexity trade-off.

Supply Chain Attack Deep Dive

Understanding actual attack techniques helps design defenses.

Attack Vector 1: Malicious Dependencies

Sophisticated variant - Gradual compromise:

  1. Attacker publishes legitimate package to npm
  2. Builds reputation over months (stars, downloads, community)
  3. Version 1.0-1.5: Completely benign
  4. Version 1.6: Adds innocent-looking code with subtle backdoor
  5. Version 1.7: Backdoor activates only in production environments
  6. Version 1.8: Backdoor connects to C2 server only after 30 days

Detection is hard because:

  • Code review sees small incremental changes
  • Automated scanning doesn’t catch targeted logic
  • Trigger conditions avoid sandbox detection

Defense:

# Policy: Require security review for new dependencies
# .github/workflows/dependency-check.yml
name: Dependency Review
on: [pull_request]

jobs:
  review:
    runs-on: ubuntu-latest
    steps:
      - name: Check for new dependencies
        uses: actions/dependency-review-action@v3
        with:
          fail-on-severity: moderate
          # Custom policy: Any new dependency triggers manual review
          deny-licenses: GPL-3.0, AGPL-3.0

Behavioral analysis:

# Example: Detect suspicious behavior in dependencies
import subprocess
import sys

def check_dependency_behavior(package_name):
    """
    Run package installation in instrumented environment
    to detect suspicious behavior
    """
    suspicious_behaviors = [
        "network connections to unknown IPs",
        "filesystem access outside package directory",
        "spawning child processes",
        "environment variable enumeration",
        "crypto mining signatures"
    ]

    # Use sandboxed environment (Firejail, Docker, etc.)
    result = subprocess.run(
        ["firejail", "--net=none", "pip", "install", package_name],
        capture_output=True
    )

    # Analyze strace logs, network attempts, filesystem access
    # Flag anything suspicious for manual review

Attack Vector 2: Repository Compromise

Real example: CodeCov (2021)

Attackers modified CodeCov’s Bash uploader script:

# Original (legitimate)
curl -s https://codecov.io/bash | bash

# Attackers modified the script on CodeCov's servers
# Script exfiltrated environment variables (including CI secrets)
# Sent to attacker-controlled servers

29,000 customers potentially compromised because they piped remote scripts to bash.

Defense - Verified downloads:

# Never do this:
curl https://example.com/install.sh | bash

# Instead:
curl -O https://example.com/install.sh
curl -O https://example.com/install.sh.sha256

# Verify checksum
sha256sum -c install.sh.sha256

# Inspect script before running
less install.sh

# Run only after verification
bash install.sh

Better: Use package managers

# Instead of downloading scripts:
- name: Install tool
  run: |
    wget https://example.com/tool.tar.gz
    tar xzf tool.tar.gz

# Use versioned package:
- uses: hashicorp/setup-terraform@v2
  with:
    terraform_version: 1.5.7  # Specific version

Attack Vector 3: Build Environment Persistence

Scenario:

  1. Attacker compromises shared build server
  2. Plants malicious script in /usr/local/bin/
  3. Script modifies binaries during compilation
  4. All subsequent builds are compromised
  5. Attack persists even after attacker access is removed

Example poisoned build:

# Attacker's malicious /usr/local/bin/gcc wrapper
#!/bin/bash
# Call real gcc
/usr/bin/gcc "$@"

# If building auth-related code, inject backdoor
if [[ "$@" == *"auth"* ]]; then
    # Inject malicious code into binary
    /usr/local/bin/inject-backdoor $output_file
fi

Defense - Immutable build infrastructure:

# Don't use long-lived build servers
# Use ephemeral environments

# GitHub Actions (isolated by default)
runs-on: ubuntu-latest  # Fresh VM every time

# Self-hosted runners - use VM snapshots
runs-on: self-hosted
# Revert to clean snapshot after each build

For self-hosted infrastructure:

# Terraform: Immutable build workers
resource "aws_instance" "build_worker" {
  ami           = var.builder_ami  # Golden image
  instance_type = "c5.xlarge"

  # Instance terminates after use
  instance_initiated_shutdown_behavior = "terminate"

  # Tag for lifecycle management
  tags = {
    Purpose = "ephemeral-build-worker"
    TTL     = "2h"  # Auto-terminate after 2 hours
  }
}

Attack Vector 4: Dependency Confusion

Advanced variant - Targeting scoped packages:

// package.json
{
  "dependencies": {
    "@mycompany/core-utils": "^1.0.0"
  }
}

If you forget to configure registry for @mycompany scope:

# Checks private registry - not found
# Falls back to public npm
# Attacker has published @mycompany/core-utils to public npm
# Malicious package gets installed
npm install

Comprehensive defense:

# .npmrc - Lock down all scopes
@mycompany:registry=https://npm.pkg.github.com/
@mycompany:_authToken=${GH_TOKEN}

# Block access to public registry for scoped packages
registry=https://registry.npmjs.org/
@mycompany:registry=https://npm.pkg.github.com/

# Verify packages match expected signatures
strict-ssl=true

Organization-level protection:

# Registry-side defense: Reserve namespace
# Contact npm Enterprise, GitHub Packages, or Artifactory
# Request namespace reservation for @mycompany
# Only your organization can publish packages under that scope

Policy Enforcement with OPA and Kyverno

At enterprise scale, you can’t manually review every deployment. Policy-as-code enforces security requirements automatically.

Open Policy Agent (OPA)

OPA is a general-purpose policy engine. You write policies in Rego (a declarative language), and OPA evaluates them.

Use case: Verify container images are signed

# policy.rego
package kubernetes.admission

import future.keywords.if

deny[msg] if {
    input.request.kind.kind == "Pod"
    image := input.request.object.spec.containers[_].image

    # Check if image signature exists in Rekor
    not image_is_signed(image)

    msg := sprintf("Image %v is not signed", [image])
}

image_is_signed(image) if {
    # Query Rekor transparency log
    # Verify signature exists for this image digest
    # Implementation depends on your signing infrastructure
    signature := http.send({
        "method": "GET",
        "url": sprintf("https://rekor.sigstore.dev/api/v1/log/entries?logIndex=%v", [image])
    })

    signature.status_code == 200
}

Integration with Kubernetes:

# Install OPA Gatekeeper
kubectl apply -f https://raw.githubusercontent.com/open-policy-agent/gatekeeper/master/deploy/gatekeeper.yaml

# Apply constraint template
apiVersion: templates.gatekeeper.sh/v1
kind: ConstraintTemplate
metadata:
  name: requiresignedimages
spec:
  crd:
    spec:
      names:
        kind: RequireSignedImages
  targets:
    - target: admission.k8s.gatekeeper.sh
      rego: |
        package requiresignedimages

        violation[{"msg": msg}] {
            container := input.review.object.spec.containers[_]
            not is_signed(container.image)
            msg := sprintf("Container image %v is not signed", [container.image])
        }

        is_signed(image) {
            # Call out to Cosign verification
            # Or check internal database of signed images
        }

Result: Unsigned images can’t deploy, even if someone bypasses code review.

Kyverno (Kubernetes-Native Alternative)

Kyverno uses YAML for policies instead of Rego, making it more accessible.

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: verify-image-signatures
spec:
  validationFailureAction: enforce
  background: false
  webhookTimeoutSeconds: 30
  failurePolicy: Fail
  rules:
    - name: verify-signature
      match:
        any:
        - resources:
            kinds:
              - Pod
      verifyImages:
      - imageReferences:
        - "myregistry.io/*"
        attestors:
        - count: 1
          entries:
          - keyless:
              subject: "https://github.com/myorg/myrepo/.github/workflows/*"
              issuer: "https://token.actions.githubusercontent.com"
              rekor:
                url: https://rekor.sigstore.dev

This policy verifies every image from myregistry.io was signed by your GitHub Actions workflows.

Policy Use Cases

1. Enforce SLSA provenance:

# Require all production deployments have SLSA attestation
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: require-slsa-attestation
spec:
  rules:
  - name: check-attestation
    match:
      resources:
        kinds:
        - Deployment
        namespaces:
        - production
    verifyImages:
    - attestations:
      - predicateType: https://slsa.dev/provenance/v0.2
        conditions:
        - all:
          - key: "{{ builder.id }}"
            operator: In
            value: ["https://github.com/myorg/myrepo/.github/workflows/build.yml@refs/heads/main"]

2. Block images with known vulnerabilities:

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: block-vulnerable-images
spec:
  rules:
  - name: check-vulnerabilities
    match:
      resources:
        kinds:
        - Pod
    preconditions:
      all:
      - key: "{{ request.operation }}"
        operator: In
        value: [CREATE, UPDATE]
    validate:
      message: "Image has critical vulnerabilities"
      foreach:
      - list: "request.object.spec.containers"
        deny:
          conditions:
            any:
            - key: "{{ scan_result(element.image).critical_vulns }}"
              operator: GreaterThan
              value: 0

3. Enforce least privilege:

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: require-non-root
spec:
  rules:
  - name: check-containers-not-root
    match:
      resources:
        kinds:
        - Pod
    validate:
      message: "Containers must not run as root"
      pattern:
        spec:
          containers:
          - securityContext:
              runAsNonRoot: true

Zero-Trust CI/CD Architecture

Traditional CI/CD assumes trust within the build network. Zero-trust assumes everything is potentially compromised.

Principles

  1. Verify explicitly - Always authenticate and authorize, never assume trust
  2. Least privilege - Minimal access needed for each job
  3. Assume breach - Design for compromise containment

Reference Architecture

┌─────────────────────────────────────────────────────────────┐
│ Developer Workstation                                       │
│                                                             │
│ 1. git push (signed commit)                                 │
└────────────────────┬────────────────────────────────────────┘


┌─────────────────────────────────────────────────────────────┐
│ Source Repository (GitHub/GitLab)                           │
│                                                             │
│ 2. Webhook triggers pipeline                                │
│ 3. OIDC token issued to workflow                            │
└────────────────────┬────────────────────────────────────────┘


┌─────────────────────────────────────────────────────────────┐
│ Isolated Build Environment (Ephemeral VM)                   │
│                                                             │
│ 4. Verify commit signature                                  │
│ 5. Fetch code (read-only)                                   │
│ 6. Build in sandbox (no network)                            │
│ 7. Generate SLSA provenance                                 │
│ 8. Sign artifact with OIDC identity                         │
└────────────────────┬────────────────────────────────────────┘


┌─────────────────────────────────────────────────────────────┐
│ Artifact Storage (OCI Registry)                             │
│                                                             │
│ 9. Store signed artifact + provenance                       │
│ 10. Transparency log entry (Rekor)                          │
└────────────────────┬────────────────────────────────────────┘


┌─────────────────────────────────────────────────────────────┐
│ Policy Gate (OPA/Kyverno)                                   │
│                                                             │
│ 11. Verify signature                                        │
│ 12. Check SLSA provenance                                   │
│ 13. Validate vulnerability scan results                     │
│ 14. Enforce deployment policies                             │
└────────────────────┬────────────────────────────────────────┘


┌─────────────────────────────────────────────────────────────┐
│ Production Environment                                      │
│                                                             │
│ 15. Pull verified artifact                                  │
│ 16. Deploy with workload identity                           │
│ 17. Runtime attestation verification                        │
└─────────────────────────────────────────────────────────────┘

Key Security Controls

Identity-based access:

# No long-lived credentials anywhere
# Build uses OIDC:
- AWS IAM role trusts GitHub OIDC provider
- Role policy specifies: "only main branch from specific repo"
- Temporary credentials issued for job duration
- Credentials expire after 1 hour

# Deployment uses workload identity:
- Kubernetes ServiceAccount mapped to cloud IAM role
- Pod can only access resources its role permits
- No secrets stored in cluster

Network segmentation:

# Build network cannot reach production
# Deployment network is separate
# Strict firewall rules between segments

# Example: AWS Security Groups
resource "aws_security_group" "build_network" {
  egress {
    # Build can access artifact registry
    to_port   = 443
    protocol  = "tcp"
    cidr_blocks = [var.registry_cidr]
  }

  # No ingress allowed
  # No other egress allowed
}

resource "aws_security_group" "production_network" {
  # Production cannot be reached from build network
  ingress {
    from_port = 443
    to_port   = 443
    protocol  = "tcp"
    cidr_blocks = [var.user_traffic_cidr]
  }
}

Monitoring and alerting:

# Detect unusual activity
alerts:
  - name: UnusualBuildBehavior
    query: |
      # Alert if build job makes unexpected network calls
      network_connections{job="build"} and not (
        destination in ["registry.io", "github.com"]
      )

  - name: AnomalousDeployment
    query: |
      # Alert if deployment happens outside business hours
      deployment_created{hour < 8 or hour > 18}

  - name: PrivilegeEscalation
    query: |
      # Alert if pipeline role assumes higher privileges
      iam_assume_role{target_role=~".*admin.*"}

Incident Response Playbook

Scenario 1: Compromised CI/CD Secrets

Detection triggers:

  • Cloud provider alerts about API calls from unusual IPs
  • Unexpected deployments to production
  • Third-party services report unauthorized access
  • Security scanning detects malicious code in recent build

Response procedure:

Phase 1: Contain (First 30 minutes)

# 1. Immediately revoke compromised credentials
aws iam delete-access-key --access-key-id AKIA...

# 2. Suspend all pipeline executions
gh api -X POST /repos/OWNER/REPO/actions/runs/RUN_ID/cancel

# 3. Block network access from build infrastructure
aws ec2 revoke-security-group-ingress --group-id sg-... --cidr 0.0.0.0/0

# 4. Enable enhanced logging
aws cloudtrail create-trail --name incident-response-trail --s3-bucket-name forensics-bucket

Phase 2: Investigate (Hours 1-4)

# Audit all deployments in compromise window
gh api /repos/OWNER/REPO/actions/runs \
  --jq '.workflow_runs[] | select(.created_at > "2024-01-01T00:00:00Z") | {id, name, head_sha, created_at}'

# Check what credentials accessed
aws cloudtrail lookup-events \
  --lookup-attributes AttributeKey=AccessKeyId,AttributeValue=AKIA... \
  --max-items 1000 \
  --output json > cloudtrail-forensics.json

# Analyze API calls
cat cloudtrail-forensics.json | jq '.Events[] | {time: .EventTime, event: .EventName, resource: .Resources, ip: .SourceIPAddress}'

# Check for data exfiltration
aws s3api list-objects --bucket production-data --query 'Contents[?LastModified>`2024-01-01`]'

Phase 3: Eradicate (Hours 4-8)

# Rotate all secrets (not just compromised one)
# Assumption: If one leaked, others might have too

# Generate new credentials
aws iam create-access-key --user-name ci-deployment-user

# Update CI platform with new secrets
gh secret set AWS_ACCESS_KEY_ID --body "$NEW_KEY_ID"
gh secret set AWS_SECRET_ACCESS_KEY --body "$NEW_SECRET"

# Delete old credentials
aws iam delete-access-key --access-key-id AKIA_OLD...

# Review and potentially rebuild recent deployments
for sha in $(git log --since="2024-01-01" --format=%H); do
    # Verify commit signature
    git verify-commit $sha

    # Check if build artifacts are signed
    cosign verify myregistry.io/myapp:$sha

    # If verification fails, rebuild from source
    ./rebuild.sh $sha
done

Phase 4: Recovery (Hours 8-24)

# Deploy known-good version
# From before compromise started
git checkout KNOWN_GOOD_SHA
./deploy.sh production

# Restore monitoring and automation
# With enhanced security controls

# Implement lessons learned:
# - Move to workload identity (no stored secrets)
# - Add policy gates (OPA/Kyverno)
# - Enable artifact signing
# - Increase audit logging

Phase 5: Post-Incident (Week 1-2)

# Incident Report Template

## Summary
- **Date**: 2024-01-15
- **Duration**: 8 hours (detection to full recovery)
- **Impact**: Unauthorized access to production AWS account
- **Root cause**: CI/CD secret exposed in repository history

## Timeline
- 00:00: Attacker discovers AWS credentials in old commit
- 02:30: Unusual API calls detected by CloudWatch
- 03:00: Security team alerted
- 03:15: Credentials revoked
- 04:00: Investigation begins
- 08:00: All secrets rotated, systems recovered
- 12:00: Enhanced monitoring deployed

## What went well
- Detection within 2.5 hours
- Rapid credential revocation
- No customer data accessed

## What went poorly
- Secret was in Git history for 6 months before detection
- Manual secret rotation took 4 hours (should be automated)
- No policy preventing secrets in code

## Action items
- [x] Implement pre-commit hooks to block secrets
- [x] Enable GitHub secret scanning
- [x] Migrate to workload identity (no long-lived credentials)
- [x] Automate secret rotation
- [ ] Conduct tabletop exercise for next incident

Scenario 2: Malicious Code in Production

Detection:

  • Security team reports backdoor in production deployment
  • Unexpected network traffic from production to unknown IPs
  • Customer reports suspicious behavior

Immediate response:

# 1. Identify malicious deployment
kubectl get deployments -n production -o json | \
  jq '.items[] | {name, image, created: .metadata.creationTimestamp}'

# 2. Rollback to last known-good version
kubectl rollout undo deployment/myapp -n production

# 3. Block malicious image
cat <<EOF | kubectl apply -f -
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: block-compromised-image
spec:
  validationFailureAction: enforce
  rules:
  - name: block-bad-image
    match:
      resources:
        kinds:
        - Pod
    validate:
      message: "Compromised image blocked"
      deny:
        conditions:
        - key: "{{ images.*.digest }}"
          operator: Equals
          value: "sha256:MALICIOUS_DIGEST"
EOF

# 4. Quarantine affected nodes
kubectl cordon node-1 node-2 node-3
kubectl drain node-1 --ignore-daemonsets --delete-emptydir-data

Forensic analysis:

# Extract malicious image for analysis
docker pull myregistry.io/myapp@sha256:MALICIOUS_DIGEST
docker save myregistry.io/myapp@sha256:MALICIOUS_DIGEST > malicious-image.tar

# Analyze image layers
docker history myregistry.io/myapp@sha256:MALICIOUS_DIGEST

# Extract filesystem
mkdir image-contents
tar -xf malicious-image.tar -C image-contents

# Search for suspicious files
find image-contents -name "*.sh" -o -name "*.elf" | xargs strings | grep -i "backdoor\|shell\|payload"

# Check network connections if container ran
# Get pod that ran the malicious image
kubectl logs -n production POD_NAME --previous | grep -E "connect|socket|bind"

Determine scope:

# How did malicious code get into pipeline?
# Check build logs
gh api /repos/OWNER/REPO/actions/runs/RUN_ID/logs

# Review recent pipeline changes
git log --since="1 month ago" -- .github/workflows/

# Check for compromised dependencies
npm audit --json > audit.json
cat audit.json | jq '.vulnerabilities | to_entries[] | select(.value.severity=="critical")'

Economic Analysis of CI/CD Security

Cost of Prevention

Baseline security (SLSA 1-2):

  • Engineering time: 40 hours initial setup
  • Tooling: $0-500/month (GitHub Actions, open source tools)
  • Maintenance: 10 hours/month
  • Annual cost: ~$20,000 (engineering time) + $6,000 (tools) = $26,000

Advanced security (SLSA 3):

  • Engineering time: 200 hours initial (workload identity, policy enforcement, signing)
  • Tooling: $2,000/month (Sigstore infrastructure, policy engines, monitoring)
  • Maintenance: 40 hours/month
  • Annual cost: ~$100,000 (engineering time) + $24,000 (tools) = $124,000

Enterprise security (SLSA 4):

  • Engineering time: 1,000 hours initial (hermetic builds, reproducibility)
  • Dedicated security team: 2 FTEs
  • Tooling: $10,000/month (enterprise tools, support contracts)
  • Annual cost: ~$300,000 (team) + $120,000 (tools) = $420,000

Cost of Breach

CircleCI breach (2022-2023):

  • Customer engineering time rotating secrets: 29,000 customers × 8 hours = 232,000 hours
  • At $100/hour: $23,200,000 in customer labor
  • CircleCI’s costs: Investigation, incident response, customer support, reputation damage
  • Estimated total: $50-100M

SolarWinds breach (2020):

  • Remediation costs: >$100M
  • Legal fees, settlements: Unknown (ongoing)
  • Reputation damage: Stock price dropped 25%
  • Customer trust: Permanent impact

CodeCov breach (2021):

  • 29,000 potentially compromised customers
  • Estimated customer cost to rotate credentials: $10-50M aggregate
  • CodeCov’s reputation: Significant damage

Break-Even Analysis

If your organization has:

  • 100 engineers ($100/hour)
  • 20 services in production
  • SaaS business with $50M ARR

Cost of breach (conservative):

  • Engineering time investigating: 100 engineers × 40 hours = $400,000
  • Credential rotation: 200 hours across teams = $20,000
  • Customer notification: 40 hours = $4,000
  • Reputation damage: 10% customer churn = $5,000,000
  • Total: ~$5.4M

Probability of breach over 5 years:

  • Without CI/CD security: ~15% (based on industry data)
  • With SLSA 2: ~5%
  • With SLSA 3: ~1%

Expected value:

  • No security: 0.15 × $5.4M = $810,000 expected loss
  • SLSA 2: 0.05 × $5.4M = $270,000 expected loss + $26,000 annual cost = $296,000
  • SLSA 3: 0.01 × $5.4M = $54,000 expected loss + $124,000 annual cost = $178,000

Result: SLSA 3 pays for itself if there’s even a 1% chance of breach over 5 years.

Implementation Roadmap for Enterprise Deployment

Months 1-3: Foundation

Goals:

  • Eliminate secrets from code
  • Enable dependency scanning
  • Implement basic access controls

Tasks:

  • Audit all repositories for committed secrets
  • Implement secret scanning (GitHub Advanced Security, git-secrets)
  • Move secrets to CI platform secret managers
  • Enable Dependabot or equivalent
  • Set up CODEOWNERS for pipeline files
  • Pin third-party actions to commit hashes
  • Implement branch protection rules

Success metrics:

  • Zero secrets in new commits
  • 100% of repositories have dependency scanning
  • All workflow changes require review

Months 4-6: Workload Identity

Goals:

  • Eliminate long-lived credentials
  • Implement OIDC authentication

Tasks:

  • Configure OIDC trust in AWS/GCP/Azure
  • Create IAM roles for each workflow
  • Migrate workflows to workload identity
  • Test credential expiration and renewal
  • Rotate and delete old long-lived credentials
  • Document OIDC setup for new services

Success metrics:

  • 80% of workflows using workload identity
  • Zero high-privilege long-lived credentials
  • Automated credential lifecycle

Months 7-9: Artifact Signing

Goals:

  • Achieve SLSA Level 2
  • Implement artifact signing

Tasks:

  • Set up Sigstore infrastructure (Cosign, Rekor)
  • Integrate signing into build pipelines
  • Generate SLSA provenance for builds
  • Implement verification in deployment
  • Create policy requiring signatures for production
  • Train teams on verification

Success metrics:

  • 100% of production deployments signed
  • Provenance available for all artifacts
  • Automated verification before deploy

Months 10-12: Policy Enforcement

Goals:

  • Implement admission control
  • Enforce security policies

Tasks:

  • Deploy OPA Gatekeeper or Kyverno
  • Create policies for image signing
  • Implement vulnerability scanning gates
  • Enforce SLSA provenance requirements
  • Set up policy violation alerting
  • Document policy exceptions process

Success metrics:

  • Unsigned images cannot deploy
  • High/critical vulnerabilities blocked
  • Policy violations < 5/month

Months 13-18: Advanced Hardening

Goals:

  • Achieve SLSA Level 3
  • Implement hermetic builds

Tasks:

  • Migrate to isolated build environments
  • Implement reproducible builds
  • Set up build caching and distribution
  • Network isolation for builds
  • Immutable build infrastructure
  • Continuous compliance monitoring

Success metrics:

  • Build reproducibility: 95%
  • Build isolation: 100%
  • Zero build contamination incidents

Ongoing: Monitoring and Improvement

Continuous tasks:

  • Weekly security scanning review
  • Monthly policy effectiveness review
  • Quarterly threat modeling updates
  • Annual penetration testing
  • Ongoing team training
  • Incident response drills

Lessons from Real Breaches

SolarWinds (2020): Build System Compromise

What happened:

  • Attackers compromised SolarWinds’ build system
  • Injected malicious code into Orion software
  • Malware distributed via trusted software updates
  • Affected 18,000+ customers including US government agencies

How it worked:

  • Attackers gained access to build environment
  • Modified source code before compilation
  • Signed malicious binaries with SolarWinds’ legitimate certificate
  • Updates distributed through normal channels

What would have prevented it:

  1. Hermetic builds - External code couldn’t have been injected
  2. Reproducible builds - Someone could have detected the mismatch
  3. Code signing with transparency logs - Anomalous signatures would have been visible
  4. Network isolation - Build environment couldn’t communicate with attacker C2

CodeCov (2021): Script Modification

What happened:

  • Attacker modified CodeCov’s Bash uploader script
  • Script ran in customer CI/CD pipelines
  • Exfiltrated environment variables (secrets)
  • 29,000 customers potentially affected

How it worked:

  • Attacker gained access to CodeCov’s infrastructure
  • Modified script served by codecov.io
  • Customers ran curl https://codecov.io/bash | bash
  • Script sent environment variables to attacker

What would have prevented it:

  1. Verified downloads - Checksum verification would have detected tampering
  2. Avoid piping to bash - Use package managers instead
  3. Secret scoping - Limited environment variables to necessary jobs
  4. Network monitoring - Unusual outbound connections would trigger alerts

CircleCI (2022-2023): Secret Exfiltration

What happened:

  • Attacker compromised employee laptop
  • Escalated to production systems
  • Stole customer secrets from CircleCI’s secret storage
  • Customers forced to rotate thousands of credentials

How it worked:

  • Employee laptop compromised (likely malware)
  • Attacker moved laterally through CircleCI network
  • Accessed production secret storage
  • Exfiltrated encrypted secrets and decryption keys

What would have prevented it:

  1. Workload identity - No secrets stored to steal
  2. Zero-trust networking - Compromised laptop couldn’t reach production
  3. Secret encryption - Separate key management from secret storage
  4. Access monitoring - Unusual access patterns would trigger alerts
  5. Just-in-time secrets - Temporary credentials limit damage window

Conclusion: Pragmatic Security Posture

CI/CD security is infrastructure security. The right level depends on your threat model:

For most teams (startups, internal tools, low-risk applications):

  • Target: SLSA 2
  • Store secrets in CI platform
  • Pin dependencies
  • Basic scanning
  • Cost: ~$25,000/year
  • Risk reduction: 80% of attacks prevented

For security-conscious teams (SaaS, regulated industries, customer data):

  • Target: SLSA 3
  • Workload identity
  • Artifact signing
  • Policy enforcement
  • Isolated builds
  • Cost: ~$125,000/year
  • Risk reduction: 95% of attacks prevented

For critical infrastructure (finance, healthcare, defense, open source foundations):

  • Target: SLSA 4
  • Hermetic builds
  • Reproducibility
  • Advanced monitoring
  • Dedicated security team
  • Cost: ~$420,000/year
  • Risk reduction: 99% of attacks prevented

The CircleCI breach showed that even major CI/CD platforms can be compromised. Defense in depth - combining technical controls with operational processes - provides the best protection.

Perfect security is impossible. Good-enough security prevents real-world attacks and limits damage when breaches occur. Start with basic controls, measure what matters, and invest in harder problems as your security maturity grows.

You've finished reading this deep-water level content