Platform engineering is a sociotechnical problem¶

This was one of the better platform talks at KubeCon because it dealt with a problem most teams eventually hit and rarely admit cleanly: the platform looks solid, the engineering is serious, the abstractions are tidy, and yet users still keep finding ways around it.

Sony Interactive Entertainment used that tension as the starting point for a much more honest discussion about internal platforms. The core lesson was simple: once a platform serves multiple teams with different constraints, architecture stops being the whole story. What matters just as much is how teams interact, how decisions are made, and how you measure whether any of it is actually helping.

Quick takeaways¶

Platform bypasses are usually feedback, not just non-compliance.
Better architecture helps, but it does not remove coordination failure.
Product thinking starts when you measure adoption and usability, not just delivery.
"Done" should mean relied on, not merely implemented.

Producer-consumer platform model

The producer-consumer framing cuts through coordination sprawl

Once a platform serves multiple teams, "who needs to talk to whom and when?" becomes the real bottleneck — not the architecture. Sony's producer-consumer framing is a practical tool for answering that: identify who owns a capability, who depends on it, and when interaction is actually necessary. Reduce implicit dependencies and make the contracts explicit. That's where coordination overhead shrinks.

What was getting in the way¶

Sony described a pattern that will sound familiar to a lot of platform teams.

They had already done the work people normally recommend:

operators
pipelines
standardised deployment paths
abstractions and golden paths
cross-functional platform teams

From the outside, the platform looked mature.

But inside the organisation, the signals told a different story:

backlog kept growing
teams kept asking for exceptions
some teams wanted direct access to lower-level infrastructure
other teams bypassed the platform entirely and rebuilt pieces themselves

That was not happening because users wanted chaos. It was happening because the platform did not always fit the reality of the teams consuming it.

That shift matters. Once you see those behaviours as product feedback instead of governance failure, the problem becomes much clearer.

Why architecture was necessary but not sufficient¶

One part of the talk focused on architecture patterns that still matter a lot:

controller logic
reconciliation loops
clear resource contracts
composable boundaries
producer-consumer relationships between teams

That is good platform engineering. It gives teams cleaner interfaces and a more reasonable mental model.

But Sony's experience was that architecture improvements did not automatically solve scaling problems between teams. As the number of consumers and capabilities grew, coordination overhead grew with it.

That is where a lot of platform teams get stuck. They keep refining the technical model while the real friction is increasingly organisational.

Centralising coordination doesn't solve coordination problems

Sony hit this directly: every dependency routed through the same enablement team looks like a solution but creates a bottleneck. If your team is the unavoidable review step for every capability change, you haven't reduced coordination overhead — you've just consolidated it somewhere it's easier to observe. The fix is clearer capability boundaries and more local decision-making, not a bigger central team.

What broke next: team interaction at scale¶

Sony described how a previously simple communication model stopped working once more teams entered the platform ecosystem.

In the earlier stage, a team that needed something could just speak directly to the one other team that owned it. Responsibilities were clear. Boundaries were understandable. Coordination cost was manageable.

As the platform expanded, that turned into a mesh of dependencies. A single capability could require several teams in the room at once. Release delays, hidden dependencies, and last-minute escalations became more common. Small changes could take weeks.

They responded by mapping team interaction paths and trying to make the delivery flow more understandable. That helped at first, but it also exposed a trap: too many interactions began converging through an enablement team.

That is a useful warning sign.

If every dependency ends up routed through the same group, you have not solved the coordination problem. You have centralised it.

The better question: who produces, who consumes, and when do they need to talk?¶

One of the most practical ideas in the talk was shifting the conversation towards producer-consumer relationships.

That framing helps answer three useful questions:

who owns this capability?
who depends on it?
when is interaction actually necessary?

That is especially important in globally distributed organisations, where you cannot afford to solve every ambiguity with another recurring meeting.

Sony used this thinking to define clearer capability boundaries and encourage more local decision-making by teams, while still aligning with higher-level organisational goals.

That is a good model for platform teams in general: fewer implicit dependencies, fewer permanent coordination channels, more explicit contracts.

Why product thinking changed the control loop¶

This was the strongest part of the talk.

Sony realised they had excellent operational observability of their infrastructure, but weak observability of whether the platform itself was working as a product.

They could answer questions like:

what is the CPU usage of this cluster?
what is the health of these nodes?
what is the state of these services?

But they could not answer the more important product questions:

are teams actually adopting the capability we built?
how quickly do users get value from it?
is it easy to use?
does it reduce or increase workarounds?

That is the shift from platform as project to platform as product.

The roadmap can no longer be measured only by milestones, scope, or backlog movement. Those metrics tell you whether you shipped something. They do not tell you whether anyone wanted it in the form you shipped.

Sony changed the control loop by looking at:

time to value
adoption
user satisfaction
operational efficiency

That is a much healthier set of signals for an internal platform.

Apply this: change what 'done' means on your next delivery

Try this for one capability in the next sprint: don't count it as done when it ships — count it as done when a team uses it without help and doesn't file an exception. That one change in definition will surface documentation gaps, UX friction, and missing support paths that the delivery milestone never would.

The most useful idea: change what "done" means¶

Their redefinition of done is worth copying.

Previously, done meant a feature or project had been completed and delivered.

After the shift, done meant something users could rely on.

That includes more than implementation:

integrated
documented
supported
adopted
useful enough that teams actually depend on it

That is a better bar for platform work.

Internal platforms accumulate debt very quickly when teams celebrate delivery before they prove usefulness.

What I would copy from this talk¶

If I were applying this in a real platform team, I would start with four things.

1. Treat platform bypasses as product research¶

If teams keep building side paths, do not start with policy. Start with diagnosis.

Which need is not being met? What is too rigid? What is too slow? What is too opaque?

2. Reduce dependency sprawl between teams¶

If five teams need to coordinate for one normal capability change, you probably have a boundary problem.

Make the interfaces clearer or move the ownership.

3. Add product signals to platform reviews¶

Ask about adoption, time to first value, satisfaction, and workarounds alongside the usual reliability and capacity metrics.

Without those signals, platform teams can be very busy and still strategically off course.

4. Raise the definition of done¶

Do not count a capability as complete just because it shipped.

Count it as complete when teams can use it confidently and stop reaching for alternatives.

The bigger point¶

The most important line in the talk was not about Kubernetes or controllers. It was that scaling the platform required changes across three layers:

architecture
team interaction
feedback loops

That is the right model.

Most internal platforms struggle because one of those three layers is improving while the others remain stuck.

You can have clean abstractions and weak feedback. You can have strong teams and poor boundaries. You can have good infrastructure and bad product thinking.

Platform engineering starts to look more effective when those three layers move together.

References¶

Frequently asked questions¶

Where can I watch the Sony KubeCon session?

The talk was presented by Hagen Eugenia at KubeCon EU 2026. Find it on the CNCF YouTube channel — search "Sony platform engineering KubeCon EU 2026". The conference PDF is also linked in the References section.

How do you identify whether your platform has a product problem vs a technical problem?

Ask two questions. First: do teams that understand the platform well still find workarounds? If yes, it's a product problem — the capability doesn't fit the use case. Second: are bypass rates going up while features are being shipped? If yes, you're building for the platform's vision rather than user needs. Technical problems disappear when you fix the code; product problems persist regardless of code quality.

Is this 'platform as product' approach only relevant for large organisations like Sony?

No — the feedback loop breakdown Sony described (building without measuring adoption) happens at 5 engineers as easily as at 500. The scale determines the complexity of the symptoms, not the underlying cause. Small platform teams benefit from product thinking even more, because wasted effort on unused capabilities is proportionally more damaging when the team is small.

How does the sociotechnical framing connect to day-to-day platform work?

Sociotechnical means the technical and organisational systems are coupled — you can't improve one without considering the other. In practice: every architectural decision has a team interaction implication. Adding a new capability changes who needs to coordinate and how. If you design the architecture without mapping the coordination impact, you'll ship clean code into a messy coordination pattern and wonder why adoption is slow.

What metric would you watch first to know if the product-thinking shift is working?

Exception volume — the count of teams requesting direct access, custom deployment paths, or policy exemptions. If the platform is working as a product, exceptions should decrease as capabilities improve. If exceptions are growing alongside feature delivery, the platform is solving problems teams don't have while missing ones they do. Track it quarterly.