Skip to content

Banner image Banner image

Platform engineering is a sociotechnical problem

This was one of the better platform talks at KubeCon because it dealt with a problem most teams eventually hit and rarely admit cleanly: the platform looks solid, the engineering is serious, the abstractions are tidy, and yet users still keep finding ways around it.

Sony Interactive Entertainment used that tension as the starting point for a much more honest discussion about internal platforms. The core lesson was simple: once a platform serves multiple teams with different constraints, architecture stops being the whole story. What matters just as much is how teams interact, how decisions are made, and how you measure whether any of it is actually helping.

Quick takeaways

  • Platform bypasses are usually feedback, not just non-compliance.
  • Better architecture helps, but it does not remove coordination failure.
  • Product thinking starts when you measure adoption and usability, not just delivery.
  • "Done" should mean relied on, not merely implemented.

Producer-consumer platform model Producer-consumer platform model

What was getting in the way

Sony described a pattern that will sound familiar to a lot of platform teams.

They had already done the work people normally recommend:

  • operators
  • pipelines
  • standardised deployment paths
  • abstractions and golden paths
  • cross-functional platform teams

From the outside, the platform looked mature.

But inside the organisation, the signals told a different story:

  • backlog kept growing
  • teams kept asking for exceptions
  • some teams wanted direct access to lower-level infrastructure
  • other teams bypassed the platform entirely and rebuilt pieces themselves

That was not happening because users wanted chaos. It was happening because the platform did not always fit the reality of the teams consuming it.

That shift matters. Once you see those behaviours as product feedback instead of governance failure, the problem becomes much clearer.

Why architecture was necessary but not sufficient

One part of the talk focused on architecture patterns that still matter a lot:

  • controller logic
  • reconciliation loops
  • clear resource contracts
  • composable boundaries
  • producer-consumer relationships between teams

That is good platform engineering. It gives teams cleaner interfaces and a more reasonable mental model.

But Sony's experience was that architecture improvements did not automatically solve scaling problems between teams. As the number of consumers and capabilities grew, coordination overhead grew with it.

That is where a lot of platform teams get stuck. They keep refining the technical model while the real friction is increasingly organisational.

What broke next: team interaction at scale

Sony described how a previously simple communication model stopped working once more teams entered the platform ecosystem.

In the earlier stage, a team that needed something could just speak directly to the one other team that owned it. Responsibilities were clear. Boundaries were understandable. Coordination cost was manageable.

As the platform expanded, that turned into a mesh of dependencies. A single capability could require several teams in the room at once. Release delays, hidden dependencies, and last-minute escalations became more common. Small changes could take weeks.

They responded by mapping team interaction paths and trying to make the delivery flow more understandable. That helped at first, but it also exposed a trap: too many interactions began converging through an enablement team.

That is a useful warning sign.

If every dependency ends up routed through the same group, you have not solved the coordination problem. You have centralised it.

The better question: who produces, who consumes, and when do they need to talk?

One of the most practical ideas in the talk was shifting the conversation towards producer-consumer relationships.

That framing helps answer three useful questions:

  • who owns this capability?
  • who depends on it?
  • when is interaction actually necessary?

That is especially important in globally distributed organisations, where you cannot afford to solve every ambiguity with another recurring meeting.

Sony used this thinking to define clearer capability boundaries and encourage more local decision-making by teams, while still aligning with higher-level organisational goals.

That is a good model for platform teams in general: fewer implicit dependencies, fewer permanent coordination channels, more explicit contracts.

Why product thinking changed the control loop

This was the strongest part of the talk.

Sony realised they had excellent operational observability of their infrastructure, but weak observability of whether the platform itself was working as a product.

They could answer questions like:

  • what is the CPU usage of this cluster?
  • what is the health of these nodes?
  • what is the state of these services?

But they could not answer the more important product questions:

  • are teams actually adopting the capability we built?
  • how quickly do users get value from it?
  • is it easy to use?
  • does it reduce or increase workarounds?

That is the shift from platform as project to platform as product.

The roadmap can no longer be measured only by milestones, scope, or backlog movement. Those metrics tell you whether you shipped something. They do not tell you whether anyone wanted it in the form you shipped.

Sony changed the control loop by looking at:

  • time to value
  • adoption
  • user satisfaction
  • operational efficiency

That is a much healthier set of signals for an internal platform.

The most useful idea: change what "done" means

Their redefinition of done is worth copying.

Previously, done meant a feature or project had been completed and delivered.

After the shift, done meant something users could rely on.

That includes more than implementation:

  • integrated
  • documented
  • supported
  • adopted
  • useful enough that teams actually depend on it

That is a better bar for platform work.

Internal platforms accumulate debt very quickly when teams celebrate delivery before they prove usefulness.

What I would copy from this talk

If I were applying this in a real platform team, I would start with four things.

1. Treat platform bypasses as product research

If teams keep building side paths, do not start with policy. Start with diagnosis.

Which need is not being met? What is too rigid? What is too slow? What is too opaque?

2. Reduce dependency sprawl between teams

If five teams need to coordinate for one normal capability change, you probably have a boundary problem.

Make the interfaces clearer or move the ownership.

3. Add product signals to platform reviews

Ask about adoption, time to first value, satisfaction, and workarounds alongside the usual reliability and capacity metrics.

Without those signals, platform teams can be very busy and still strategically off course.

4. Raise the definition of done

Do not count a capability as complete just because it shipped.

Count it as complete when teams can use it confidently and stop reaching for alternatives.

The bigger point

The most important line in the talk was not about Kubernetes or controllers. It was that scaling the platform required changes across three layers:

  • architecture
  • team interaction
  • feedback loops

That is the right model.

Most internal platforms struggle because one of those three layers is improving while the others remain stuck.

You can have clean abstractions and weak feedback. You can have strong teams and poor boundaries. You can have good infrastructure and bad product thinking.

Platform engineering starts to look more effective when those three layers move together.

References