Skip to content

Banner image Banner image

AI-driven GitOps with MCP and Argo CD

This was one of those talks where the demo felt uncomfortably close to real life. You could see the practical value straight away: fewer support bottlenecks, faster incident triage, and less context-switching for developers.

Quick takeaways

  • Put AI where engineers already work (Slack/Teams), not in a new interface.
  • MCP makes Argo CD actions discoverable and reusable for different AI clients.
  • Start with operational tasks where feedback is fast: deploy, check health, roll back.
  • Capture troubleshooting logic once and reuse it through Agent Skills.

What was getting in the way

At KubeCon EU 2026, Alexander Matyushentsev (Argo CD co-founder) and Leonardo Luz Almeida (Intuit) laid out the problem clearly: managing 350+ Kubernetes clusters with 3,000+ production services across 50,000+ namespaces creates massive support bottlenecks. Developers wait in Slack channels for troubleshooting help. Expert DevOps engineers drown in repetitive diagnostic work. Traditional UI extensions fail because they don't meet users where they already are.

What we actually wanted

An AI-powered GitOps experience where developers use natural language to deploy applications, troubleshoot production issues, and automatically roll back failures - all through the tools they already use daily (Slack, Claude, Copilot).

Architecture in one view

MCP + Argo CD container view MCP + Argo CD container view

Model Context Protocol (MCP): the universal connector

MCP is a standardised bridge between AI clients and services like Argo CD. Instead of building custom integrations for every LLM, MCP provides:

  • JSON-RPC protocol over stdio or HTTP (Server-Sent Events or polling)
  • Discovery API exposing tools (functions), resources (read-only data), and prompts (guidance text)
  • Token passthrough for authentication - MCP delegates auth to underlying services
  • One-to-one tool mapping with Argo CD CLI/UI capabilities (sync, inspect, logs, manifests)

Open Source MCP for Argo CD

Available at github.com/argoproj-labs/argocd-mcp with growing community adoption (#mcp-for-argocd on CNCF Slack with 16 members at conference time).

Three use cases that worked in practice

1. Natural Language Application Creation

Before: Fill out complex forms, specify manifest repos, branches, namespaces, sync policies
After: "Create an app called 'frontend' using the manifests from github.com/myorg/apps, main branch, deploy to production namespace"

The AI agent discovers available MCP tools, constructs the Argo CD Application resource, and deploys - faster than manual creation.

2. Batch Deployment from Git Directories

Prompt: "Connect to my GitOps repo and create an Argo CD application for each directory under /apps"

The agent: - Scans the Git repository structure - Identifies manifest directories - Creates multiple applications automatically - Implements app-of-apps patterns without manual intervention

This is genuinely faster than human assembly for repetitive structures.

3. Automated Deployment with Intelligent Rollback

Prompt: "Deploy version 2.0 of my service, monitor its health, and roll back to the previous version if it degrades"

The agent: 1. Syncs the new version 2. Continuously checks application health via Argo CD API 3. Analyzes degradation patterns from status conditions 4. Automatically reverts manifests on failure detection

This reduces Mean Time To Recovery (MTTR) from hours to minutes.

Intuit's production journey

Failed Experiment: Argo CD UI Extension

Intuit initially built an AI-powered troubleshooting extension directly in the Argo CD UI: - Extracted logs, Kubernetes events, live state, desired state via Argo CD API - Provided LLM-powered root cause analysis - Result: Poor adoption - experts went straight to logs, novices never opened Argo CD

Key Lesson: Don't build new interfaces; integrate where users already are.

Breakthrough: Slack Bot Integration

Moving AI troubleshooting into existing Slack support channels achieved dramatically higher engagement:

Example 1: Stack Trace Analysis
Developer: "I'm seeing warnings in argo logs, not sure if critical"
Bot: "Not critical. The class is attempting to cast byte array during schema validation. Check implementation at line 76, ensure serialization order is: byte array → deserialized logic → schema validation."

Example 2: Multi-Failure Triage
Developer: "Production deployment issue"
Bot: "I see TWO failures: (1) Recent deployment can't reach config server, (2) Current running version is degraded. Which should I investigate?"
Developer: "Current failure"
Bot (2.5 min later): "Application cannot retrieve configuration from Spring config server due to connection issue. Root cause: App configured with e2e environment URL while this is production. Update URL from config-e2e.company.com to config-prod.company.com"

Reverse Proxy Pattern

Single MCP server acts as facade for 40+ Argo CD instances, simplifying: - Developer access management - Service-to-service communication - Security boundaries and token distribution

Agent Skills: The Next Evolution

To avoid duplicating troubleshooting logic across UI extensions, bots, and CLI tools, Intuit is experimenting with Agent Skills - reusable markdown-based diagnostic recipes.

Skills define: - How to extract Argo CD base URLs and application names - Step-by-step troubleshooting procedures for degraded apps - Which API calls to make and how to interpret responses

This creates a library of operational knowledge that any agent can consume.

Implementation Guide

For platform teams

  1. Deploy MCP for Argo CD from argo-cd-labs
  2. Configure HTTP transport for remote access (OAuth 2.0 or token passthrough)
  3. Integrate with existing support channels (Slack, Teams) rather than building standalone UIs
  4. Create prompts that instruct AI on common failure patterns:
  5. Image pull errors from Kubernetes events
  6. Resource limit violations from quota checks
  7. Config server misconfigurations from environment mismatches

For developers

  1. Test free-form application creation with natural language
  2. Experiment with batch operations via Git directory scanning
  3. Implement automated health monitoring with roll back conditions
  4. Measure engagement: track before/after metrics for bot interactions vs UI usage

For organisations

  1. Document troubleshooting workflows as Agent Skills in markdown format
  2. Share diagnostic logic across tools to eliminate duplication
  3. Consider MCP for other GitOps ecosystem tools (Flux, Argo Rollouts, External Secrets Operator)

What changed in practice

Before: Weeks-long support backlogs, expert knowledge bottlenecks, manual deployments
After: Conversational GitOps operations, automated diagnostics, self-service at scale

The convergence of GitOps and AI through standardised protocols like MCP changes how platform teams operate. By meeting users where they are and encoding operational knowledge in reusable formats, teams move from "Platform as Code" to "Platform as Conversation".

References


Presented at KubeCon + CloudNativeCon Europe 2026 by Alexander Matyushentsev (Akuity) and Leonardo Luz Almeida (Intuit)