Cloud Native & Open Source AI Conference — Notes¶
These are my working notes from the sessions I attended.
The throughline across almost everything was the same tension platform teams have been sitting with for months: agents are genuinely useful, and making them safe to run at enterprise scale is harder than anyone wants to admit. Every speaker who touched on agentic workloads eventually arrived at the same place — governance isn't the constraint on adoption, it's the prerequisite for it. The Kafka session was the exception. That was about a completely different kind of complexity.
I'm not going to reproduce every bullet. What follows is what actually landed, what I'm thinking about differently, and what I'd do next week.
Keynote: Kubernetes, Headlamp, and Agentic Governance¶
Andrew Randall, Pal Lakatos-Toth — Microsoft
The keynote opened with a reflection on why Kubernetes won. The answer wasn't "better technology". It was better technology and a genuinely open, inclusive community. Both. Neither on its own.
That framing mattered more than it sounds. Because the second half of the keynote was about agentic workloads on Kubernetes, and the same logic applies. You don't make agents safe by picking the right model. You make them safe by building the right community of practice around them — and that starts with governance engineers actually trust.
On the UX side: Headlamp is now the official upstream Kubernetes UI. SIG UI has deprecated the previous dashboard. Multi-cluster visibility, a plugin model, a desktop app — real answers to onboarding friction, not cosmetic improvements. If you have engineers who avoid the CLI because it's overwhelming, Headlamp is worth a proper evaluation before the end of the quarter.
The demo that stuck with me was the agentic governance piece. CRD-driven controls specifying which tools an agent can call, what it can't do without operator approval, and how it routes through an inference policy before touching anything. Not "prompt the agent to be careful". Enforce it. Rate limits, budget controls, egress constraints — all expressed as Kubernetes resources. The agent proposes a fix, the operator approves, the controller reconciles. That's the loop you want.
The governance pattern that actually works
Tool policy, inference policy, runtime definition — all expressed as Kubernetes CRDs running live on-stage. The agent can still be useful. It just can't be reckless. The key distinction: separate agent autonomy from agent authority. Autonomy is what the agent decides to do. Authority is what it's actually permitted to do. Enforce the latter with policy, not prompts.
What to do next: - Evaluate Headlamp as your default multi-cluster UI in a lab before rolling it out broadly - Map one platform workflow that could become a CRD-backed agent contract - Define your minimum governance baseline: tool policy, inference policy, workload identity, audit trail - Build development and validation loops on kind before managed-cluster rollout
The Massively Parallel Agent Stack¶
Peter Bhabra — Doubleword
This was the talk I'd been hoping to see. Not "here's a demo where the agent writes code". But "here's why one giant agent is the wrong shape for this problem".
The use case was a security audit across a reasonably sized codebase — 512 files, roughly 2.5 million tokens of source. The single-agent approach burned through context, churned, and delivered weak coverage. You've probably seen this: the agent gets increasingly confused about what it's already seen and starts generating output that's plausible but thin.
The swarm approach worked differently. An orchestrator decides team composition and carves up the work. Specialist workers — each with a clear, bounded role like "find injection patterns" or "check unsafe file access" — process their slices in parallel. They return structured, schema-bound outputs. And critically: an independent verifier reviews each finding before it's included. A dedicated synthesiser then turns verified findings into an actionable report.
Better coverage, lower token cost. And the same runtime can be repointed at completely different use cases just by swapping prompts, schemas, and tools. The economic model for background audits is asynchronous inference — you don't need low-latency responses for something that runs overnight.
Context window size is not the same as coverage quality
Bigger context doesn't mean the agent sees more clearly — it often means it sees everything less clearly. The parallel worker approach isn't just about speed. It's about keeping each worker's context small, specific, and focused. A verifier stage before surfacing findings to engineering backlogs isn't optional. It's how you avoid noisy or hallucinated issues becoming real tickets.
What to do next: - Prototype a small swarm runtime for one internal use case — dependency review or secret scanning are good starting points - Define a minimal findings schema now (severity, confidence, file path, evidence, recommendation) before you need one - Add an independent verifier stage before creating issues in engineering backlogs - Benchmark single-agent vs swarm on the same repository with the same acceptance criteria
Silent Interpretation Errors at Scale¶
Maebh Booth — Senior Engineering Leader
This talk named something that's been quietly frustrating platform teams for months. Not "the agent failed" — but "the agent succeeded at the wrong thing and didn't mention it".
The term is silent interpretation drift. The agent makes an implicit decision, picks an interpretation, and moves forward without surfacing the ambiguity. It doesn't ask. It just gets on with it, because forward motion is what it's optimised for. The output often looks correct — tests pass, the prose is coherent — but it's missing something you specified, or it assumed something you didn't. At enterprise scale, across many teams and shared workflows, those small errors compound fast.
Her recommended controls are practical. Intent-based skills that route on cognitive operation rather than exact phrasing. Explicit "stop and ask" rules before agents commit inferred IDs, config defaults, or assumed paths. Deterministic pre-commit hooks that block known-bad patterns — not advisory, hard gates. And telemetry. Without transcript instrumentation and periodic evaluation, you can't prove your guardrails are working. You can only hope they are.
The point that landed hardest: require agents to bundle all their clarifications and ask them once, not interrupt five times mid-task. It sounds small. It changes the interaction model completely.
Soft prompts can't do the job of hard gates
Advisory guidance helps at the margins. But for anything high-impact — inferred paths, assumed IDs, hard-coded values — you need deterministic enforcement. One gate plus one measurable evaluation beats ten advisory skills with no feedback loop.
What to do next: - Audit current agent workflows for recurring overreach: assumed values, invented constants, ignored constraints - Add one explicit "stop and ask" gate for inferred values in a critical repository this week - Add push-time metrics for skill activation and non-compliance events - Schedule a weekly evaluation pass to catch behavioural drift before it compounds
Prompt-Driven Platforms: The Future of Self-Service Infrastructure¶
Salman Iqbal, Amir Tayabali — Wayfinder
Good demos are easy. Safe, repeatable, enterprise-wide adoption in high-security environments is the hard bit. That's where this talk lived, and it was more honest about the difficulty than most.
The central tension: give agents enough context and access to be genuinely useful, while still enforcing controls that security and compliance teams can audit. In locked-down environments, prompt conventions and local settings don't cut it. You need a gateway, a wrapper, and explicit trust boundaries — not because you distrust the AI, but because you need to explain what happened when something goes wrong.
The shadow IT angle was sharp. If platform teams don't provide a safe, easy path for AI tooling, engineers will build their own. And those homegrown setups won't have the governance controls you spent months designing. The paved road has to be genuinely paved — not a traffic cone maze that takes twenty minutes to navigate.
Context quality came up as a force multiplier. Teams getting the best agent outputs weren't just using better prompts. They were surfacing richer service-catalog metadata through Backstage. Who owns this service? What does it depend on? What are the constraints? When the agent has that context, it makes better decisions. When it doesn't, it guesses.
If you don't build the paved road, engineers will build their own
Shadow AI is the enterprise equivalent of shadow IT. It happens when the unofficial path is easier than the official one. Gateway enforcement and wrapper-based tool launch aren't bureaucracy — they're what lets you say "yes, use it" rather than "use it and good luck".
What to do next: - Map every current AI access path and identify which ones bypass policy controls - Pilot gateway-mediated model access in one team before broader rollout - Invest in service-catalog metadata quality — it's a force multiplier for agent output quality - Publish a small set of platform-curated skills and track actual adoption
Controlling AI Agent Access in Cloud-Native Engineering Workflows¶
Viola Lykova — nuclecode
The clearest reframe of the day: agent access control is a platform engineering problem, not a prompt engineering problem.
The moment agents move from autocomplete to autonomous action — triggering CI, opening PRs, mutating infrastructure — they become execution actors. And execution actors need the same rigour as human privileged access. The problem isn't that agents are malicious. It's that their scope tends to expand quietly over time. You add a tool. You widen a token scope. You give them access to production "just for diagnostics". Three months later you're looking at a token with write permissions across five services that was originally scoped to read-only logs.
Drift is the word she used. Not a single bad decision, but incremental scope expansion that's hard to notice until you audit it. And without continuous drift detection, you won't audit it until something goes wrong.
The state mutation framing matters. Every tool call can change platform reality. Every API call is evidence. The discipline is: explicit permission contracts, scoped tokens per tool per action class, approval checkpoints for high-risk changes, and forensic audit trails that attach actor, scope, rationale, and approval metadata to each meaningful mutation. Prompts can influence behaviour. They are not security boundaries.
A prompt is not a permission boundary
Natural-language instructions guide what an agent wants to do. Tokens, scopes, and policy enforcement determine what it can do. These are not the same thing, and treating them as equivalent is how you end up with an agent that's polite about exceeding its authority.
What to do next: - Create an agent access inventory: tool, token, scope, owner, environment — even a spreadsheet works to start - Replace broad shared credentials with purpose-specific scoped tokens per tool and action class - Add policy checks that block scope increases without explicit approval - Define and test rollback paths for high-impact mutations before you need them in production
Platform as a Product: What Happens When We Treat Security as a User?¶
Hannah Foxwell — Bimp
If your platform team genuinely builds "platform as a product" but security teams are still downstream reviewers rather than first-class users, you're not actually doing it.
That's the core provocation, and it landed. Developer experience has received enormous investment over the last few years — golden paths, self-service, reducing friction. Security teams have largely been left with manual triage queues that don't scale with the pace of discovery. And the AI-era wrinkle makes this worse: vulnerability discovery is accelerating. AI-assisted scanning will find more CVEs, faster. Traditional human-only triage is already approaching its limits, and we're at the beginning of the curve, not the middle.
The practical moves aren't glamorous but they're achievable. Filter and correlate findings before they hit human queues. Surface ownership metadata on every finding so the right team gets the right alert. Add runtime environment and data sensitivity to guide prioritisation urgency. Automate the obvious patches — not everything, just the changes with high confidence and low blast radius. That alone removes a significant chunk of the toil and lets security teams focus on the decisions that actually require human judgment.
Security teams are platform users — run the same discovery with them
Interview your security team the way you'd interview developers for user research. Map their current pain in your platform workflows. The gap between what security teams need and what platform teams have built for them is often embarrassingly wide, and most of it is fixable once you can see it clearly.
What to do next: - Run one user research session with your security team, treating them as platform users - Add ownership and impact metadata to every service and surface it in security findings - Pilot one automated remediation workflow for a high-frequency, low-risk vulnerability class - Define a response playbook for high-severity disclosures and rehearse it before you need it
Beyond the Portal: AI-Native Platforms with CNOE and MCP¶
Hossein Salahi
The portal isn't dead. But it's no longer the only surface that matters, and for some users it was never quite the right one.
That's the honest read of this talk. Portals did useful work: they standardised self-service and gave platform teams a way to offer repeatable workflows without writing a Slack bot for everything. But for operational users — SREs, incident responders, people who are in a terminal or a chat window when something is on fire — a static form wizard is the wrong shape.
The CNOE/MCP model being shown here is additive, not a replacement. Your existing GitOps, policy controls, and capability planes stay exactly where they are. What changes is the interface and control model. An alert triggers agent reasoning. The agent gathers context through MCP-connected tools. It proposes a Git-based fix. A human approves. The controller reconciles. No new control plane, no parallel infrastructure — just a better interface for an operational workflow that already exists.
The piece I'd push on in my own context: cross-session memory. If the same class of alert fires again next month, the agent should have context from the previous resolution. That's what turns a one-off demo into a durable operational pattern.
Start with one operational workflow that's currently too slow in a portal
Don't frame this as "AI-native replaces the portal". Frame it as "operational users who can't afford to navigate forms under pressure now have a better interface". Pick one high-friction workflow — noisy alert class, slow diagnosis path — and build the agent contract for that. Don't start with a platform overhaul.
What to do next: - Identify one operations workflow that's genuinely painful in portal-only mode - Define an agent contract for it: scope, tools, allowed actions, approval requirements - Route all production-affecting changes through PRs with required human review, even in the agentic path - Capture incident telemetry and assess whether agent-assisted workflows actually improve MTTR
Intro to Apache Kafka on Aiven: From Managed Simplicity to Inkless Architectures¶
Hugh Evans — Aiven
The different-shaped talk. Not about agents. About a real shift in how Kafka stores data.
Classic Kafka replicates partitions across brokers across availability zones. That's the durability model. It works. It's also expensive — cross-AZ data transfer is one of the biggest cost drivers in managed Kafka deployments. And operationally, it's heavy: rebalancing when you add or replace a broker is a genuine pain that gets worse as clusters grow.
Diskless Kafka — sometimes called "inkless" or "disaggregated", the terminology is still settling — moves durability off broker-local disks and onto object storage. The economics are materially different: object storage replication doesn't carry the same cross-AZ transfer cost as broker-to-broker replication. You also get less rebalancing friction when scaling horizontally, because broker state isn't tied to local disk.
The trade-off is latency. If your workload is latency-sensitive, the object storage path adds overhead you need to measure and design for. Mixed-mode support — running classic and diskless topics in the same cluster — gives you the flexibility to choose per workload rather than making a binary call for the whole cluster.
Worth watching, not uncritically adopting today. The open-source governance path involves competing KIPs and community alignment work, which is a useful reminder that "accepted direction" and "production ready" are different milestones.
Pull your cross-AZ transfer costs before anything else
That number is the entire justification for a pilot. For many teams it will be larger than expected, and the potential saving from diskless topics for the right workload class is real. The latency trade-off matters, but it's manageable if you profile first.
What to do next: - Profile current Kafka workloads by latency sensitivity and retention/replay patterns - Quantify cross-AZ transfer costs as a baseline before running any pilot - Test mixed-mode topics in a non-production cluster with p95/p99 latency measurement - Define clear topic placement rules (classic vs diskless) for engineering teams to use
The Age of "Big Tech" is Over¶
Sean M Tracey — Mitchell Technologies
The most contrarian talk of the day. Also probably the most honest.
The argument wasn't about market share or antitrust. It was more personal and more pointed: we've stopped questioning the assumption that progress requires giant platforms with giant defaults. We accept that a simple internal tool needs Kubernetes with five nodes, a managed database, a load balancer, and a monthly bill that would embarrass a small-team startup. We call it "enterprise-grade". Sometimes it's just overengineered.
Sean's counter: build small. Build local. Measure actual usage before sizing anything. The live demo — lightweight app deployment, a browser-based calling flow running with minimal resource usage — wasn't technically impressive by the standards of the rest of the conference. That was the point. Most of what we build doesn't need impressive infrastructure. It needs sufficient infrastructure.
The open source compounding argument is worth sitting with. Decades of shared tooling have made it dramatically cheaper to build good software than it was ten years ago. But we've developed habits of complexity that cancel out most of that gain. We reach for the same patterns we used when they were necessary, in contexts where they're not.
I'm not sure "the age of big tech is over" quite holds up as a thesis. But "our default infrastructure assumptions are probably wrong more often than we check" — that I buy.
Start from the minimum, measure your way up
When you next propose infrastructure for a new internal tool, start from the smallest thing that solves the real user need. "We'll scale later" is reasonable. "Let's start with enterprise defaults just in case" is where the waste compounds over years.
What to do next: - Audit one current service for over-provisioning against measured usage data, not assumed load - Pilot one new internal tool using the smallest viable deployment path and compare delivery speed - Compare total cost of ownership for one workflow across your current platform vs a lighter-weight path - Document a "small-first" engineering guide for new internal projects
What I'm taking away¶
Taken together, the sessions painted a fairly consistent picture.
Agents are moving into operational workflows — not as a future capability but as something teams are actively deploying now. The teams doing it well share three things: governance expressed as policy and code rather than prompts, genuine observability over what agents are actually doing, and a clear human-in-the-loop for high-impact changes. The teams doing it badly are relying on natural language instructions to enforce security boundaries. They're not.
On the platform side: portals aren't being replaced, but they are being supplemented. The operational interface is shifting toward conversational and MCP-connected patterns for users who live outside the portal. Context quality — service ownership, dependency metadata, team annotations — is becoming a force multiplier for agent usefulness. The teams with richer catalogs are getting better outputs.
And the Kafka session was a useful reminder that not everything interesting is about AI. Diskless architecture is a real cost and operational improvement for the right workloads, and it's worth following as it matures through the KIP process.
The question I keep coming back to is the one from Maebh Booth's session. If your team is using AI agents and hasn't explicitly defined what "stop and ask" means versus "use your judgment", you don't know what you're actually running. That conversation is worth having before something goes quietly wrong.
Which session was most immediately actionable?
The parallel swarm talk (Peter Bhabra) is probably the most directly applicable. The pattern is simple enough to prototype in a day or two, and the "independent verifier before backlogs" insight applies to almost every team running agents at scale.
Is governance-first realistic for smaller teams?
Yes. You don't need a full CRD-based control plane to start. Viola Lykova's access inventory approach — tool, token, scope, owner, environment — can live in a spreadsheet today. The principle scales down; the implementation grows with you.
What's the one thing I'd take into a Monday morning team meeting?
The silent interpretation drift frame from Maebh Booth. If your team is using agents and hasn't explicitly defined what "stop and ask" means versus "use your judgment", you don't know what you're actually running. That conversation is worth having before something goes quietly wrong.
Is the 'age of big tech is over' framing credible?
Probably not literally. But the underlying point — that our default infrastructure sizing assumptions are wrong more often than we check — is worth taking seriously regardless of how the headline lands.