The Error Spiral: Why Bad Operations Get Worse and Good Ones Get Better
- Mac Davis

- Apr 5
- 8 min read
And why the difference between the two always compounds
Every plant manager has watched it happen. A changeover gets rushed at the start of second shift. The operator swaps the change parts but doesn't verify something critical. Maybe it's a standard that's been written down, maybe it's not.
The line runs for forty minutes before someone notices there's something wrong (bottles dented, labels crooked, wrong materials, wrong print, etc).
Now there's a quality hold, a financial loss, a mountain of rework, and a cascade of issues through the schedule which requires everyone in your facility to react.
In fact, it may cause people in other facilities to react too. Your customer's schedule changes because you can't deliver. Maybe your customer has emergency orders too.
That one error could cascade not just through multiple people but through multiple companies and facilities.
It's not bad luck. It's predictable, repeatable, and controllable. And it's your job to make it stop happening.
Errors Don't Arrive Alone
Every error in a manufacturing environment does two things simultaneously. It produces an immediate, visible consequence (the defect, the downtime, the rework) and it generates a wave of unplanned work that didn't exist before the error occurred.
That unplanned work lands on a system that was already fully loaded. The supervisor who should be verifying operator process and startup conditions is now managing a quality hold. The lead operator who should be supporting the new hire is now sorting suspect product from the last two hours of production. The scheduler who should be planning tomorrow's run sequence is now recalculating output against a customer deadline and trying to get rush orders of raw materials. The salesperson who should be selling growth is now begging a customer for forgiveness.
And so it spirals.
Unplanned work isn't just additional work. It's work performed under conditions that generate errors: time pressure, incomplete information, interrupted routines, fatigued people working outside their normal patterns. Rushing. Frustration. Habit disruption. Each of these is a recognized error state, a condition that measurably increases the probability that the next task will also go wrong.
The error didn't just cost you the immediate consequence. It purchased a set of conditions specifically likely to produce the next error.
The Spiral Accelerates
One error generates unplanned work. That unplanned work generates error states. Those error states generate new errors. Each new error generates its own wave of unplanned work.
At low error rates, the system absorbs this. Slack capacity exists. Problems get caught at startup. The spiral never gains momentum.
But there is a threshold and most operations are already past it.
Once unplanned work consumes enough of the system's capacity, the people responsible for preventing errors no longer have time to prevent errors. Supervisors stop verifying setups. Changeover checklists get signed without being completed. New operators get five minutes of instruction instead of fifty. Standards drift without anyone noticing because no one has time to look.
The system is now generating errors faster than it can respond to them. Every response to an error creates conditions for the next one. The spiral is self-sustaining and it accelerates.
This is why struggling operations don't plateau. They deteriorate. The failure cascade isn't a series of independent bad events, it’s a single reinforcing loop of failure.
The Flywheel Runs the Other Way
Here's the part that doesn't get talked about enough.
The same compounding dynamic that drives operations into the spiral also drives them out of it.
When errors decrease, unplanned work decreases. When unplanned work decreases, the people responsible for preventing errors get their time back. Supply can check inventory for upcoming runs instead of doing emergency orders. Supervisors can verify setups and processes proactively. New operators get trained because there's time for it. Change parts get inspected before they're needed instead of swapped in a panic during a rushed changeover.
Standards can be verified rather than assumed.
Verified standards prevent errors. Which, in turn, further reduces unplanned work, frees more capacity and prevents more errors.
The flywheel is the spiral running backwards. The compounding effect is identical, but instead of accelerating toward chaos, it accelerates toward stability. Each improvement creates the conditions for the next improvement, and the gains are not linear. They are exponential.
This is what world-class operations actually are. Facilities where the flywheel has been spinning long enough that the compounding benefits have accumulated into something that looks, from the outside, like effortless excellence.
It isn't effortless. It’s not that some facilities just have better equipment and smarter people. It’s the accumulated result of a loop that was pointed in the right direction and allowed to run.
Every Error Has a System Behind It
Here's the uncomfortable truth that most post-incident reviews never reach.
Errors don't come from bad people. They come from bad conditions, and bad conditions come from systems that either created them or failed to prevent them.
The error states that precede virtually every workplace failure are not personality traits. They are system outputs:
Rushing: When speed is rewarded over correctness, errors follow. A culture that celebrates urgency without discipline is a culture that manufactures its own mistakes. Thought should always precede action, and systems should be designed to make that possible.
Fatigue: When shift design, staffing levels, or the physical demands of the job aren't managed, people deplete. An operator running the same repetitive motion for ten hours isn't careless. The body failing because the system asked too much of it isn't a personnel problem. It's a design problem.
Frustration: People make mistakes when they're angry, and in most cases the anger follows justification. Systemic errors that never get fixed, recurring problems that keep landing on the same people, processes that make the job harder than it needs to be. Expressions of frustration are your operation telling you something is broken.
Habit disruption: Humphrey's Law describes how we perform at our highest capability when running on established habit. Interruptions, issues which demand attention, and process updates break up the habits people use to achieve consistency. When the interruption isn't managed carefully, the error happens in the gap between habitual action and conscious thought.
Complacency: This is when our habit is no longer aligned with the correct process. We've been doing it wrong long enough that wrong feels right. It's the most dangerous error state because it's invisible. Nothing has recently gone wrong, so nothing feels urgent. The drift is silent until it isn't.
Re-engagement gap: When a process contains natural stops, waiting periods, or handoffs long enough for an operator to mentally disengage, the error happens at the restart. The operator didn't forget the job, they lost the thread of where they were in it. This is a process design failure. If your process allows someone to fully disengage mid-task, the process is creating the error state.
Every single one of these is a controllable variable. Every single one has a system, or the absence of one, behind it.
This means that when an error occurs, the right question is never, "who made the mistake." It’s, "which process produced the conditions that made this mistake probable and what do we change so it doesn't happen again?"
Errors are not random. They are inevitable when the error states are present. They are the predictable output of systems that haven't been designed carefully enough to prevent them. And if errors are systemic, their elimination is also systemic. It's not a matter of finding better people or demanding more effort, but of fixing the upstream conditions that generate failure in the first place.
Fix the system and you fix the error rate. Fix the error rate and you start the flywheel. The compounding does the rest.
The Implication Nobody Wants to Hear
If errors compound in both directions, then the most important thing a manufacturing operation can do is to prevent them from occurring in the first place, not respond to them faster.
Every dollar spent on rework, sorting, and customer recovery is a dollar that didn't interrupt the spiral. Every hour a supervisor spends managing a quality issue is an hour that didn't go toward the setup verifications, process audits, and operator development that would have prevented it.
This is where error states matter. Errors don't emerge from nothing, they emerge from conditions. And every one of those conditions is predictable, manageable, and eliminable before it produces a consequence.
That's the leverage point. Not faster defect response. Not better root cause analysis after the product is already in the customer's hands. Systematic, upstream elimination of the conditions that generate errors in the first place.
Get ahead of the error states, and you get ahead of the errors. Get ahead of the errors, and the spiral stops. Stop the spiral, and the flywheel starts.
The Advantage Becomes Unfair
Here's what nobody tells you about operations that have been running the flywheel long enough.
The benefits stop being purely operational.
When errors recede, supervisors stop spending their days managing quality holds and rework decisions. They have time to develop their people, to verify setups before the line starts instead of after it fails, to actually lead rather than just react. The job stops being a reactive firestorm. When the job stops being a firestorm, something changes in the people doing it.
Leaders get kinder, and not because the organization ran a culture initiative or hung new values on the wall. It happens because the structural conditions that made them short-tempered, reactive, and punitive have been removed. Chronic stress produces chronic harshness. Remove the stress, and your management team can start to become the leaders they always wanted to be.
When leaders are kinder, the culture changes. People stop dreading coming to work. Tenure increases and institutional knowledge stops walking out the door every quarter. The floor becomes a place where people want to do good work, because good work is recognized, sustained, and possible.
Then something even more powerful happens.
Your reputation changes. Word travels. The facility that used to churn through operators and burn out supervisors starts attracting people who want to work there. Your applicant pool improves. You start selecting from capable candidates rather than whoever will tolerate the environment. Better people make fewer errors. Fewer errors keep the flywheel spinning.
You are now winning on top of winning. The operational advantage compounds into a cultural advantage, which compounds into a talent advantage, which compounds back into an operational advantage. Your competitors, still trapped in the spiral, still firefighting, still burning through supervisors and blaming individuals for systemic failures, cannot close that gap by working harder. The gap is structural. The gap is compounding.
The advantage is unfair. That's the point.
Where It Starts
None of this begins with a culture initiative, a new hire or a reorganization.
It begins with one question, “what are the specific conditions in this facility that are generating errors right now?”
Not which errors and not who caused them, but what conditions made them inevitable, and what would it take to eliminate those conditions before the next shift starts?
Answer that question systematically, consistently, and with enough discipline to let the flywheel build momentum, and the compounding does the rest.
The math works in your favor now. Compounding has been called the eighth wonder of the world, and most people only think about it in terms of money. But errors compound too. And so does their absence.
Systematically address your error states, point the loop in the right direction, and let it run.
Or call me. I can help.





Comments