Skip to content

Banner image Banner image

Agentic Change Management: Safer Automation at Scale

Week three. That's how long it took. We'd given our platform agent write access to the cluster, feeling pretty good about ourselves, and then a change landed in staging that nobody had reviewed. The agent had decided to scale down a deployment to save resources during low-traffic hours. Sensible logic. Terrible timing — a batch job was scheduled to run that night.

The deployment recovered quickly. But the moment it happened, something clicked: write access without a review gate isn't automation. It's a liability dressed up as automation.

So here's the pattern we settled on. Agents propose changes as PRs, humans decide whether to apply them, and rollback is always included before merge. Simple in principle. A bit more involved to actually build — which is what this post is about.

C4 Architecture Diagram C4 Architecture Diagram


Quick takeaways

  • Agents should propose, not apply — the PR is the control boundary, and that boundary actually matters
  • Risk level determines the approval gate, not the tool that made the change
  • Every agent-generated PR includes a rollback section. Non-negotiable.
  • The audit trail lives in Git — which is exactly where the rest of your platform history already lives

The PR as evidence

With direct apply, changes appear in the cluster with no PR, no context, no reviewer. You're not reviewing a decision — you're reconstructing one, in the dark, under pressure. The PR isn't just a review mechanism. It's evidence that a decision was thought through.


Why direct apply is the wrong default

There's a tempting way to measure agentic success: how much less do humans have to do? That's the wrong question.

The right question is: when something goes wrong at 2am, can you understand what happened and reverse it in under five minutes? With direct apply, the answer is almost always no. Changes appear in the cluster with no PR, no context, no reviewer. You're not reviewing a decision — you're reconstructing one, in the dark, under pressure.

Rule: agents that can write to production infrastructure should always go through a PR, even if nobody's going to read it carefully. The PR isn't just a review mechanism. It's evidence.

The 2am test

Ask yourself: if this agent-made change caused an incident tonight, could someone who wasn't there understand what happened and reverse it in five minutes? If the answer is no, the change management model needs work — not the agent.


The three-tier approval model

Not all changes deserve the same friction. Here's how we split them:

Risk Level Examples Gate
Low Updating a ConfigMap value, bumping a non-critical annotation Auto-merge after 5 minutes if CI passes
Medium Changing resource limits, scaling replicas, updating env vars One human reviewer
High Modifying RBAC, changing network policies, touching secrets config Two reviewers + platform lead sign-off

The agent classifies the change before opening the PR. If it can't classify confidently — it defaults to High. Better to over-gate than to under-gate.

Apply this: default to High when uncertain

Build the agent's risk classifier to return High whenever confidence is below a threshold — say 80%. Over-gating a low-risk change costs a few minutes of reviewer time. Under-gating a high-risk change costs potentially hours of incident response. The asymmetry makes the default obvious.


1) The agent PR template

Every agent-opened PR must include three things: what changed, why it was proposed, and how to roll it back. That last one especially. Here's the template we use:

<!-- .github/PULL_REQUEST_TEMPLATE/agent-change.md -->
## Agent-proposed change

**Classification:** [Low / Medium / High]  
**Confidence:** [e.g. 94%]  
**Agent:** [name of the agent/workflow that opened this]

### What this changes

<!-- One-paragraph description of the change and its expected effect -->

### Why this change was proposed

<!-- The signal or condition that triggered this: alert, drift detection, cost anomaly, etc. -->

### Rollback procedure

<!-- Step-by-step rollback that a human can execute without reading any other docs -->
```bash
# To revert this change:
kubectl rollout undo deployment/<name> -n <namespace>
# Or: git revert <commit> and push to trigger ArgoCD sync

References

  • Triggering event: [link to alert / PR / issue]
  • Relevant runbook: [link if applicable]

This PR was opened automatically. Review before merging.

---

## 2) GitHub Actions workflow for agent PRs

```yaml
# .github/workflows/agent-change-pr.yml
name: Agent Change PR

on:
  workflow_dispatch:
    inputs:
      change_type:
        description: 'Type of change'
        required: true
        type: string
      target_resource:
        description: 'Kubernetes resource (kind/name/namespace)'
        required: true
        type: string
      change_payload:
        description: 'JSON payload describing the change'
        required: true
        type: string
      risk_level:
        description: 'Low / Medium / High'
        required: true
        type: string

jobs:
  open-change-pr:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Apply change to branch
        run: |
          git checkout -b agent/change-$(date +%Y%m%d-%H%M%S)
          # apply the change to manifests in the repo
          python scripts/apply-agent-change.py \
            --resource "${{ inputs.target_resource }}" \
            --payload '${{ inputs.change_payload }}'

      - name: Open PR with appropriate reviewers
        env:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
        run: |
          python scripts/open-agent-pr.py \
            --risk-level "${{ inputs.risk_level }}" \
            --change-type "${{ inputs.change_type }}"

      - name: Set auto-merge for Low risk
        if: inputs.risk_level == 'Low'
        env:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
        run: |
          gh pr merge --auto --squash --delete-branch

Branch naming makes reviews faster

Use agent/change-YYYYMMDD-HHMMSS as the branch prefix. Reviewers scanning the branch list immediately know what they're looking at — and can filter their review queue to agent branches when doing a batch review session.


3) Rollback as a required field

Honestly, this is the most common failure mode we've seen: changes with no clear rollback. Agents move fast, and the humans reviewing them often don't have enough context to write the rollback themselves — they're reading the PR cold.

So we make it required. If the agent can't generate a valid rollback procedure, it doesn't open the PR — it alerts a human instead. Full stop. For the policy layer that enforces what the agent can change in the first place, see Policy as Code + Agents.

# scripts/open-agent-pr.py (excerpt)
import anthropic

def generate_rollback_procedure(change: dict) -> str:
    client = anthropic.Anthropic()

    response = client.messages.create(
        model="claude-opus-4-6",
        max_tokens=512,
        messages=[{
            "role": "user",
            "content": f"""Generate a concrete rollback procedure for this Kubernetes change.

Change: {change}

The rollback must be:
- Executable by a human in under 5 minutes
- Not require reading any other documentation  
- Include the exact kubectl or git commands to run
- Note any side effects of the rollback

If you cannot generate a safe rollback procedure, respond with CANNOT_ROLLBACK and explain why."""
        }]
    )

    text = response.content[0].text
    if "CANNOT_ROLLBACK" in text:
        raise ValueError(f"Agent cannot generate rollback: {text}")
    return text

No rollback = no PR

If the agent returns CANNOT_ROLLBACK, the workflow should alert a human rather than opening the PR without a rollback section. A PR that says "rollback: TBD" is worse than no PR — it creates false confidence that the change is reversible when it might not be.


4) Audit trail in Git

A few months in, something quietly useful happens. The Git history of agent-proposed PRs becomes a searchable record of everything the platform was doing and why. Each PR has:

  • A timestamp and the triggering event
  • The exact change that was applied
  • Who reviewed it — or the fact that it auto-merged after CI
  • A link back to the alert or metric that triggered the agent in the first place

That's what "auditable automation" actually means. Not a log file buried in a monitoring tool somewhere. A PR history, in the same repo, alongside every other platform change. The companion post on GitOps drift detection covers what happens when changes appear in the cluster without going through this process.


Common objections

"This creates too much friction."
Only for High-risk changes — and look, those changes probably should have friction. Low-risk changes auto-merge after CI. Medium-risk changes need one reviewer. Neither of those is a meaningful bottleneck.

"The agents are slow enough without reviewing PRs manually."
Sure. But think about the alternative: direct apply with no review, and then you're debugging a change with zero context at some point when it goes wrong. The PR overhead pays for itself the first time you need to know why something changed and the answer is actually there.

"What if nobody reviews the Medium-risk PRs quickly?"
Set a time-to-review SLO for agent PRs. If it's breached, page the platform lead. Agents opening PRs that sit unreviewed for days is itself a signal — it means your governance model has a gap that needs fixing.


Frequently asked questions

What is agentic change management?

It's a governance pattern where AI agents propose infrastructure changes as pull requests rather than applying them directly. The key idea is separating "propose" from "apply" — so every agent-driven change has a reviewer, a rollback procedure, and an audit trail in Git before anything actually happens in the cluster.

How do you prevent AI agents from making unsafe changes in production?

The main control is the risk-tiered approval model. Low-risk changes auto-merge after CI passes. Medium-risk changes need one reviewer. High-risk changes need multiple approvals. And if the agent can't confidently classify a change? It defaults to High. That's the safe direction to be wrong.

What should go in an agent PR template?

Three things, and honestly the third one matters most: what changed and what effect it's expected to have; why it was proposed (the triggering signal — the alert, the metric, whatever kicked this off); and a concrete rollback procedure that a human can execute in under five minutes without reading anything else. If the agent can't generate a valid rollback, it doesn't open the PR.

How do you roll back a change made by an AI agent?

Every agent PR includes a rollback section with the exact kubectl or git commands you need. Because the change lives in Git, reverting it is a git revert away — same as any other infrastructure change. The rollback procedure is a required field. Not optional, not best-effort.

What SLO should you set for reviewing agent PRs?

It depends on risk tier. Low-risk auto-merge, so no SLO needed. Medium-risk: target review within 4 business hours — long enough to not interrupt flow, short enough that changes don't queue up. High-risk: target same-day review with a platform lead. If those targets aren't being met, that's a signal your change volume or staffing model needs adjusting.


What you get

  • Every agent-driven change has a PR, a reviewer, and a rollback procedure — before it touches anything
  • Risk classification keeps low-stakes changes fast and high-stakes changes properly gated
  • The audit trail is in Git history, alongside everything else, not buried in some separate log system
  • When something goes wrong, you can find exactly what the agent did, why it did it, and how to reverse it
  • For AI work that stays traceable in the repo from the start, see Repo-Native AI Workflows

Walkthrough files

  • .github/PULL_REQUEST_TEMPLATE/agent-change.md — required fields for agent PRs
  • .github/workflows/agent-change-pr.yml — workflow that opens agent change PRs
  • scripts/apply-agent-change.py — applies the change to Git manifests
  • scripts/open-agent-pr.py — opens the PR with correct reviewers and labels

Want to go deeper on the guardrails that control what the agent can even propose? The Policy as Code + Agents post covers the Kyverno and OPA layer that sits underneath all of this.