Skip to content

GitOps Policy‑as‑Code with Argo CD + Kyverno

Banner image Banner image

We caught it in a post-incident review. A deployment had been running in production for three weeks without resource limits — not because anyone removed them deliberately, but because a PR that skipped them went through the GitOps workflow exactly like a compliant one. ArgoCD synced it. The CI pipeline passed. Nobody noticed until a memory spike took the pod down during a load test.

GitOps solves the consistency problem beautifully. Everything in Git, every change a PR, every deploy auditable. What it doesn't solve is the correctness problem. A PR that removes resource limits, exposes a service to the public internet, or skips required labels goes through the exact same workflow as a perfectly compliant one. ArgoCD doesn't know the difference — it syncs what Git says.

Kyverno is how you add the correctness layer. It's a Kubernetes-native policy engine — policies are Kubernetes resources, no Rego required — and it runs in the cluster as an admission controller. Pair it with ArgoCD and you get deployment consistency and policy enforcement, without adding a separate policy system that lives outside your GitOps workflow.

Quick takeaways

  • Kyverno runs as a Kubernetes admission controller — when ArgoCD tries to sync a resource, Kyverno validates it before it reaches the cluster. A violation fails the sync with a specific policy message, not a cryptic error.
  • Start everything in audit mode — violations are logged but resources are still admitted. Audit first to see the blast radius, then graduate to enforce.
  • Validate, mutate, and generate are the three policy types. Validate is most common; mutate is underused and powerful for injecting defaults transparently; generate automates namespace bootstrapping.
  • Policies live in Git like everything else — they're Kubernetes resources synced by ArgoCD. The policy change is a PR, the change history is a git log.
  • The feedback loop gets fast — developers see policy violations in the same PR cycle rather than finding them in a post-deploy audit weeks later.

Kyverno + ArgoCD policy enforcement flow Kyverno + ArgoCD policy enforcement flow

How Kyverno fits with ArgoCD

The integration is simpler than it sounds. Kyverno runs as a webhook in the cluster. When ArgoCD tries to sync a resource, Kubernetes routes the admission request through Kyverno before accepting it. If the resource violates a policy, the sync fails with a clear error — not a cryptic cluster error, an explicit policy violation message.

From ArgoCD's perspective, the sync failed. From Kyverno's perspective, it did exactly what it was supposed to. From the developer's perspective, they get fast, specific feedback: "this deployment is missing required labels" rather than "sync failed, go check the cluster".

The key decision is audit vs enforce mode. Kyverno supports both:

  • Audit mode — violations are logged and surfaced as policy reports, but the resource is still admitted. Use this when introducing new policies into an existing cluster — you want to see the blast radius before you start blocking.
  • Enforce mode — violations are rejected at admission. The resource never reaches the cluster. Use this for policies you're confident about and ready to enforce.

Never flip to enforce before clearing violations

The blast radius of switching a busy cluster policy from audit to enforce without first clearing existing violations is a lot of sync failures all at once. Run kubectl get policyreport -A and fix or exempt violations before you flip. This is the step most teams skip and then regret.


The three policy types

Kyverno policies do three things: validate, mutate, and generate.

Validate is the most common. A validate policy checks incoming resources against rules and either passes or fails them. This is where you enforce resource limits, required labels, approved registries, and security context rules:

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: require-resource-limits
spec:
  validationFailureAction: Enforce
  rules:
    - name: check-container-limits
      match:
        any:
          - resources:
              kinds:
                - Pod
      validate:
        message: "Resource limits are required for all containers."
        pattern:
          spec:
            containers:
              - resources:
                  limits:
                    cpu: "?*"
                    memory: "?*"

Mutate is underused but powerful. A mutate policy modifies resources before they're admitted — adding missing labels, injecting sidecar containers, setting default values. If you want to ensure all pods get a specific annotation without requiring every team to add it manually, a mutate policy does that transparently.

Mutate policies: the underrated power move

Mutate policies inject defaults silently — teams get compliant resources without knowing what "correct" looks like. This is how you enforce platform standards without creating a compliance burden for every squad deploying services. Use mutate for anything that's a sensible default rather than a hard requirement.

Generate creates new resources when a matching resource is created. The common pattern: when a new Namespace is created, automatically generate a NetworkPolicy, a ResourceQuota, and a default LimitRange. Teams get a properly configured namespace without having to know what the right defaults are.


Practical policies to start with

Required labels on all workloads. Team, environment, and cost-centre labels are the minimum for operational visibility. Without them, you can't route alerts, attribute costs, or answer "who owns this" quickly.

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: require-team-labels
spec:
  validationFailureAction: Enforce
  rules:
    - name: check-required-labels
      match:
        any:
          - resources:
              kinds:
                - Deployment
                - StatefulSet
                - DaemonSet
      validate:
        message: "Deployments must have 'team', 'environment', and 'cost-centre' labels."
        pattern:
          metadata:
            labels:
              team: "?*"
              environment: "?*"
              cost-centre: "?*"

Restrict image registries. Only allow images from your internal registry or approved external sources. This catches docker.io/somethingsketch:latest before it gets into your cluster.

No latest tags in production. Images tagged latest are a drift risk — what's latest today isn't what's latest in six months. Enforce explicit version tags in production namespaces.

Auto-generate namespace guardrails. When a new namespace is created with a team label, generate a ResourceQuota and LimitRange automatically. Teams get reasonable defaults without knowing what the right values are.

Apply this starter policy set

These four policies — resource limits, team labels, registry restrictions, no-latest-in-production — cover the most common compliance gaps in Kubernetes clusters. Start here in audit mode, run summarise-violations.py (in the companion repo), fix the violations, then flip each to enforce one at a time.


Shipping the policies via GitOps

This is where it comes together. Your Kyverno ClusterPolicies live in your GitOps repository just like everything else. ArgoCD syncs them. Kyverno enforces them.

The workflow:

  1. Propose a new policy as a PR to the policy repository
  2. Review and merge — ArgoCD syncs the ClusterPolicy to the cluster
  3. Kyverno initially runs it in audit mode — policy reports show existing violations
  4. Fix violations, switch to enforce mode in a follow-up PR
  5. All future resources that violate the policy are rejected at admission

The audit trail is complete. The policy change is a git commit. The violation history is in Kyverno's policy reports. Rollback is reverting the ClusterPolicy PR.


What changes in practice

The feedback loop shortens significantly. Before: a developer ships a deployment without resource limits, it runs in the cluster for weeks, a platform team audit finds it eventually. After: the sync fails at admission with a clear message, the developer fixes it in the same PR cycle.

For platform teams, the shift is from reactive to proactive. You're not hunting down non-compliant workloads — you're defining what compliance looks like in code, and the cluster enforces it continuously. And because the policies are in Git, the platform team can see and review every policy change, just like any other infrastructure change.

The compound effect

ArgoCD handles deployment consistency, Kyverno handles policy enforcement, Git is the source of truth for both. Policy-as-code isn't a separate tool you layer on top — it's just more GitOps. After six months, "it's in Git so it must be right" actually becomes true.


Frequently asked questions

Will Kyverno slow down our deployments?

Kyverno runs as an admission webhook — every resource creation or update goes through it. The latency impact is typically in the single-digit milliseconds for simple validate policies. For complex policies with lots of rules, you might see 20–50ms per resource. In practice, this is imperceptible in a GitOps workflow where reconciliation is async. The only place it shows up is high-frequency resource creation, like batch jobs spawning thousands of pods rapidly — and even then, tuning webhookTimeoutSeconds and scoping policies to specific resource kinds keeps it manageable.

How do we roll out Kyverno to an existing cluster with lots of non-compliant workloads?

Start every new policy in audit mode — violations are logged as PolicyReports but nothing is blocked. Then run kubectl get policyreport -A to see how many violations exist. Fix the worst offenders, or decide which ones to exempt, then switch the policy to enforce. The key is never flipping a policy to enforce before you've cleared the violations — the blast radius of getting that wrong in a busy cluster is a lot of sync failures all at once. The summarise-violations.py script in the companion repo helps make the audit phase faster.

Can Kyverno and OPA/Gatekeeper coexist in the same cluster?

Yes, but you probably don't want both. They solve the same problem with different approaches — Kyverno uses Kubernetes-native YAML policies, Gatekeeper uses Rego. Pick one. Kyverno is generally the easier choice for teams that want to write and maintain policies without learning Rego. If your org already has Rego investment or uses OPA in other contexts (Terraform, API gateways), Gatekeeper keeps the policy language consistent across the stack.

What happens if Kyverno itself goes down?

By default, Kyverno's webhooks are configured as Fail-Open for the admission path — if Kyverno is unavailable, the webhook call times out and the resource is admitted anyway. This is a deliberate availability trade-off: policy enforcement is important, but not so important that a Kyverno outage should block all deployments. You can configure Fail-Closed behaviour if your compliance requirements demand it, but be aware of the operational risk. Either way, run Kyverno with at least 2 replicas in production.

How do we handle policy exceptions for legacy workloads we can't fix immediately?

Kyverno supports PolicyException resources — you declare an exception for a specific workload, namespace, or set of resources, and Kyverno skips that policy for matching resources. Exceptions are first-class Kubernetes resources, which means they live in Git, go through PR review, and have a full audit trail. This is much better than just widening the policy — you can see exactly who has exceptions, why, and you can tie exceptions to a resolution timeline by adding annotations.


What you get

  • Policy violations caught in the PR cycle, not discovered in a post-incident review weeks later
  • Automatic namespace bootstrapping via generate policies — teams get properly configured namespaces without knowing what the right defaults are
  • Transparent defaults via mutate policies — missing labels, sidecar injections, and sensible resource defaults happen without burdening every team with the knowledge of what "correct" looks like
  • A full audit trail — every policy change is a git commit, every violation is a PolicyReport, every exception is a versioned Kubernetes resource
  • ArgoCD and Kyverno as a compound system — GitOps gives you consistency, Kyverno gives you correctness. Together they make "it's in Git so it must be right" actually true

Further reading


The working code

The companion repo has all five ClusterPolicy examples from this post — resource limits, team labels, registry restrictions, no-latest-in-production, and auto-generated namespace quotas. There's also a summarise-violations.py script for reviewing policy reports before switching from Audit to Enforce.

→ kyverno-policies example + scripts

# Summarise all violations before enforcing
python scripts/kyverno/summarise-violations.py --failures-only

# Check a specific policy
python scripts/kyverno/summarise-violations.py \
  --policy require-resource-limits \
  --exit-on-violations