Systems Over Symptoms

There's a recurring meeting on many engineering teams. Someone presents an incident from the previous week. The team discusses what went wrong and identifies action items. Everyone feels good about preventing it from happening again.

Two weeks later, a similar incident occurs. Different symptoms, same underlying problem.

This is the symptom trap—and it's everywhere in engineering organizations.

Symptoms vs Systems

Symptoms are what you see: the outage, the bug, the missed deadline, the frustrated engineer.

Systems are what create the symptoms: the processes, incentives, constraints, and feedback loops that make those outcomes likely.

When you fix a symptom, you solve the immediate problem. When you fix a system, you prevent a class of problems.

The Urge to Treat Symptoms

Treating symptoms feels good. It's concrete. You can point to it and say "I fixed that."

Treating systems is harder. It's abstract. The payoff is delayed. You might not even see the problems you prevented because... they never happened.

But treating only symptoms creates a treadmill. You're constantly fighting fires instead of making the system fire-resistant.

Examples of System Fixes

Symptom: An engineer deploys a breaking change to production.

Symptom fix: Revert the change, chastise the engineer, add a checklist item.

System fix: Implement CI/CD gates that catch breaking changes automatically. Build a culture where staging is actually representative of production. Make rollbacks one-click affairs.

Symptom: A team misses their sprint commitment three times in a row.

Symptom fix: Ask them to work harder, extend the sprint, or reduce scope.

System fix: Examine why estimates are off. Is there enough discovery before commitment? Are dependencies clear? Is scope creep happening? Fix the planning process.

Symptom: Engineers are burning out and leaving.

Symptom fix: Offer them raises, send them to conferences, give them better titles.

System fix: Examine workload distribution. Are on-call rotations fair? Is there psychological safety to push back on unrealistic deadlines? Are managers supporting their teams or just driving them?

How to Think Systemically

Ask "Why" Five Times

When something goes wrong, don't stop at the immediate cause. Keep asking why until you get to the underlying system.

Why did the bug make it to production? → No test caught it.
Why didn't the test catch it? → The test environment is different from prod.
Why is the test environment different? → It's expensive to keep them in sync.
Why is it expensive? → We have too many microservices with different configs.
Why do we have so many microservices? → We didn't set architecture guardrails.

Now you know the system problem: lack of architectural standards and environment parity.

Look for Patterns

One-off incidents are symptoms. Recurring incidents are system problems.

If you're having the same type of incident every quarter, the fix isn't another post-mortem. It's changing the system that allows those incidents to happen.

Study Incentives

People behave according to incentives. If teams are incentivized to ship features over quality, you'll get bugs. If they're incentivized to minimize costs over reliability, you'll get outages.

Fix the incentives, fix the behavior.

Map Feedback Loops

Good systems have feedback loops. Bad systems break them.

Do engineers know when they've introduced technical debt?
Do teams see the business impact of their work?
Does leadership understand the engineering constraints?

If feedback loops are broken or slow, the system will drift into undesirable states.

Making the Shift

Moving from symptom-focused to system-focused requires:

Patience: Systems change slowly. The payoff isn't immediate.

Investment: Fixing systems often requires upfront cost for long-term gain.

Measurement: You need to track leading indicators, not just lagging ones.

Culture: Everyone needs to understand that preventing fires is more valuable than fighting them.

When to Treat Symptoms

I'm not saying never treat symptoms. Sometimes you need to:

Stop the bleeding when there's an active incident
Buy time while you work on the system fix
Show progress to maintain trust

But always ask: "What's the system change that would prevent this class of problem?"

Then do that too.

Conclusion

Great engineering leaders think in systems. They understand that their job isn't to solve every problem that arises—it's to build systems that prevent problems from arising in the first place.

It's harder. It's less visible. But it's the only way to scale.

Systems over symptoms. Always.