Systems Safety Engineering in AI
Applying the mindset of instrumentation, feedback, feedforward and human-machine controllers to the complex dynamics of AI.
At about 6pm on 20 April 2010, Deepwater Horizon’s drill-floor team including chief driller Dewey Revette (48), assistant driller Stephen Curtis (40) and senior tool-pusher Jason Anderson (35), were wrapping up the last safety procedure before capping the Macondo well. They had just pumped a foamed-cement plug 1.5km below the seabed and were running a negative-pressure test to prove the plug was airtight. Think of shutting a garden tap that feeds two hoses: crack both nozzles and both hoses should slacken to zero pressure if the valve is truly closed. On the rig those hoses were the kill-line and the drill-pipe, each with its own pressure gauge. The kill-line dial fell to zero, exactly what was expected. But the drill-pipe gauge was frozen around 96 bar (1400 psi). One hose lay limp, the other hissed unseen pressure: hydrocarbons were already sneaking past the brand-new cement and climbing the pipe. This mismatch was the well’s first clear confession that it was not sealed.1
Properly read, the contradictory gauges should have halted the abandonment plan on the spot. Instead, the crew who were under schedule pressure after weeks of delays chalked it up to a quirky sensor, reopened valves, and pressed on with replacing heavy drilling mud by lighter seawater. That error set in motion a cascade that ultimately broke through every defence. At 21:49 the quiet leak became a torrent as it reached the platform; mud geysered onto the deck, gas followed, a blowout preventer that should have stopped the flow failed, and the spewing gas ignited into a fireball. Eleven workers, including Revette, Curtis and Anderson, never made it off the rig.
A catastrophe on that scale is never the product of a single broken part, though ti may have started with the failed cement cap. It’s the product of a system that mis-reads, mis-communicates, or normalises deviance until all the small safeguards line up the wrong way. Deepwater Horizon reminds us that mismatched signals, rushed interpretations, and organisational blind spots can turn an ordinary check into an irreversible cascade towards disaster.
So what can engineers and data scientists building high-stakes AI learn from this disaster, and from every complex, rapidly-cascading accident before and since?
My first job as a chemical engineering graduate was in a team designing the control system for a catalytic cracker in a Dutch oil refinery. I had to understand how to implement feedback and control mechanisms that would keep the hydrocarbon cracking process within strict safety and operating constraints. There was little room for error as the temperatures, pressures, and reaction rates had to be precisely managed to avoid damage to equipment or danger to people on site. While we worked towards optimised efficiency and yield, much of the focus, perhaps the overriding focus, was on safety. I was learning to be a safety engineer, working in the aftermath of catastrophic accidents like the Bhopal Gas Disaster(1984), Chernobyl (1986), the North Sea Piper Alpha explosion (1988) and the Exxon Valdez (1989). It was starting to become clear that the common thread in all of those accidents was not component failure, but control failure.
Later, I worked on modelling and simulating the control systems of oil rigs and refineries in the North Sea, and ethylene plants in Texas, Malaysia, Norway and other remote parts of the world. These were large, tightly coupled systems, with dangerous processes and rapid dynamics. We built dynamic models of how chemical plants operated, and then connected those into real control systems for operator training and simulation. It was a privileged opportunity to witness firsthand how complex adaptive systems, comprised of technologies, processes and people, both fail and succeed in crisis situations.
What I learned in that work has stayed with me through my whole career: failures cascade quickly, and without fast, reliable feedback and robust safety constraints, they become catastrophic. Our control systems couldn’t just be designed for steady-state conditions. They had to respond instantly to disturbances, anticipate upstream events, and be resilient to faults. One plant I worked on synthesised Low Density Polyethylene in a 2km long tubular reactor operating at pressures of 2500 atmospheres with a very short transit time. Controlling such a high hazard process meant using feedforward control (predicting problems before they happened) as well as feedback control (reacting to observed deviations), all layered with design-time analysis, human-in-the-loop supervision and interlocks. But just as important, we had to think of safety at multiple levels. It was never just about valves and temperature loops, but about how to keep an entire organisation operating safely, across engineering, operations, and management.
Looking back, that safety engineering foundation shaped how I now think about AI systems. When I work with clients planning to deploy AI systems, I see those same patterns. The AI systems we are building are increasingly dynamic and interconnected. They're embedded in messy, real-world environments and they influence consequential outcomes. I find that too often, when we talk about AI governance or ethics, we talk in abstract terms about fairness, transparency, accountability, without grappling with the engineering reality of how to keep these systems under control. In fact, where the combined disciplines of science, engineering, assurance, policy and law are all essential to safe outcomes, it often seems that it’s the voices of engineers that are most absent.
Just as in a refinery, safety doesn’t come from good intentions or policy documents alone. It comes from clear constraints, continuous control, and the ability to respond in real time when the unexpected happens. I have been lucky enough to have been afforded deep experiences in both the engineering and governance sides of safe, responsible AI. But if pushed, I would have to say I’m an engineer first - a safety engineer. And although I write a lot about AI governance, I very much believe we need to bring a much deeper safety engineering mindset to AI. Not instead of governance, but alongside it. Both are essential, and when combined, they’re far more effective. We need safety engineering principles, not just compliance frameworks, to build and operate AI systems we can trust.
So this article, is a little divergent from many I’ve written here on Doing AI Governance. It mainly discusses concepts for my engineering friends and peers who I hope can find a more active role in the decisions and practices of AI safety. However, if you come from a background in law, policy or audit, I hope it remains instructive. A glimpse into the mindset of engineers who think about safety by default.
Over two articles, I want to share how concepts like feedback loops, controller design, and adaptive safety can help us manage AI systems that must remain trustworthy over time. I’ll explore some introductory aspects of systems-theoretic safety principles (including STAMP/STPA) and how they recast accidents as control failures rather than mere component malfunctions. I’ll dig into the tricky challenge of controller architectures in AI, addressing both human and automated monitoring, and the mechanisms we can design for adaptation as AI behaviour evolves. Along the way, I’ll discuss proactive measures (like scenario testing and uncertainty monitoring) and reactive measures (runtime safeguards and post-incident learning), and how multi-level feedback loops from the technical guts all the way up to organisational governance can keep an AI system safe throughout its lifecycle.
I hope you enjoy, and would really welcome any feedback or thoughts. And if you’re a safety engineer working in or interested to work in AI safety, I’d love to talk.
Why AI Needs a Safety-Engineering Lens
AI systems today are not static programs running in a vacuum; they are sociotechnical systems. By that I mean an AI model is usually embedded in a larger context of data pipelines, user interfaces, business processes, and people. It’s part of a complex network that includes technical components and human decision-makers2. The Berkely AI Group describe this reality succinctly in their framing of ‘Compound AI Systems’3. Changes in one part (say a shift in user behaviour or a new learning policy) can cascade and affect the AI’s performance in unexpected ways. In other words, AI behaviour is an emergent property of the whole system, not just a trait of the model alone. This is very similar to what we see in complex engineering systems like a chemical plant or an airplane: safety or tendency to system failure emerges from the interactions of many elements, not just the failure of a single part.
One thing I learned in process control is that as systems get more complex and interconnected, the old ways of thinking about safety (which focused on individual component reliability or simple cause-and-effect chains) start to fall short. Nancy Leveson, a pioneer of system safety engineering, captured this well: “Things can go catastrophically wrong even when every individual component is working precisely as its designers imagined. It’s a matter of unsafe interactions among components.” 4
In the AI context, this rings true: your model could have 99.9% accuracy on paper, every module could be working as designed, and yet the AI system as a whole can still produce a disastrous outcome if the interactions and context align in the wrong way. For example, an AI-driven loan approval system might have well-tuned models for credit risk, but if it’s deployed in a feedback loop with a market or with human behaviour (think of people adapting their behavior to the AI’s decisions), unforeseen biases or instability can emerge. The takeaway is that safety is a property of the whole system, not just the components. We have to evaluate AI safety in the context of how the AI interacts with its environment and people, not just in isolation.
Viewing AI through a safety-engineering lens also means treating it as a control problem. In a chemical plant, safety was about maintaining control: keeping temperatures, pressures, flows within safe bounds via continuous feedback adjustments. Similarly, with AI, we can think in terms of control loops: the AI is making decisions (control actions) that affect the world or users, we get feedback (in the form of outcomes, user responses, performance metrics), and we (as the human overseers or as an automated monitoring system) need to adjust something (maybe the AI’s parameters, maybe the training data, maybe the operating procedures) to keep the overall system behaviour within safe limits. The goal is to avoid “losses” (accidents or harms) by enforcing constraints on the AI’s behaviour.
This perspective is at the core of STAMP (Systems-Theoretic Accident Model and Processes), a modern safety framework. STAMP basically says: accidents aren’t just the result of component failures; they result from inadequate control or enforcement of safety constraints in a complex system. In other words, if something goes wrong with an AI, we shouldn’t just ask “which component failed?” (was the training dataset biased? did an input contain bad data?). We should ask where was the control structure inadequate? Was there a missing feedback signal? Did the humans in the loop misunderstand what the AI was doing? Was there a safety constraint (like “don’t recommend self-harm content” for a content algorithm) that wasn’t effectively enforced by the system’s design?
This way of thinking pushes us to design AI oversight in terms of control loops and feedback/feedforward, not one-off checks. To some extent, it’s the opposite (but complement) of the kind of regulatory and compliance controls stipulated in laws like the EU AI Act.
To ground this in an example, consider a hypothetical AI system managing dosage of a medication for patients (with a human clinician overseeing). A traditional fault-based view might focus on a question like “did the model mis-predict the dosage due to bad training data or an incorrect algorithm?”. A control-based view (à la STAMP) would cast a wider net: How does the clinician (human controller) get feedback about the AI’s recommendations? Is the clinician’s mental model of the AI’s capability accurate, or do they perhaps trust it too much in situations outside its training? Is there an automated monitor (another controller) that watches for unsafe dosage suggestions and can intervene (like a constraint that flags doses beyond a limit)? If an overdose happened, STAMP would say it’s because one or more safety constraints in this control structure were not adequately enforced. Maybe the clinician’s mental model was wrong (they assumed the AI accounts for a certain patient factor when it doesn’t), or the monitor failed to flag a novel scenario. This broader view is incredibly relevant to AI because so often the “failure” is not a single point failure; it’s a coordination and feedback failure among many parts.
Systems-Theoretic Safety: Accidents as Control Failures, Not Component Malfunctions
Let’s go a bit deeper into systems-theoretic safety, since it provides a vocabulary for what I’m getting at. At the time of my undergraduate studies, Hazard and Interoperability (HAZOP) and Event Tree Analysis(ETA) were the state of the art in hazard identification and analysis. But in the early 2000s, as complex software started running more critical systems, safety engineers realised that the traditional event-chain models of accidents (think of the classic “fault A led to fault B led to disaster” model) were inadequate. Nancy Leveson introduced STAMP as a new way to model accidents. The key insight of STAMP is that safety is an emergent, control property of a system. Instead of viewing an accident as the end of a domino-falling sequence of component failures, STAMP views it as resulting from a lack of proper control of the system’s behaviour. In fact, Leveson argues that just reducing component failure rates doesn’t guarantee safety at all. You could have near-perfectly reliable components and still have an accident if the system’s control structure is flawed. She even argues that the evidence points to greater reliability being a contributor to the increased likelihood of accidents, as operators become mistakenly more confident that a reliable system is a safe system.5
According to STAMP, accidents happen when safety constraints are not effectively enforced throughout the system. For example, if there’s a constraint that a self-driving car must not exceed a safe stopping distance given its sensor range, an accident could occur if that constraint wasn’t enforced, perhaps due to a software update that changed braking behavior (a control action) without proper feedback, or miscommunication between the perception system and the braking controller. In STAMP lingo, this would be an unsafe control action. The car’s AI issued a control action (continue at current speed) that was unsafe under the conditions. Why did that happen? Maybe the AI’s process model (its internal representation of the world) was wrong – perhaps it didn’t realise the road was wet, increasing stopping distance. Or maybe a higher-level controller (say, a safety supervisor module or even the human driver in “monitor” mode) failed to correct it in time. The cause isn’t a single broken part, but a constellation of factors leading to loss of control.
I previously wrote about the 2023 Cruise robotaxi incident in San Francisco which perfectly illustrates this point6. After a pedestrian was flung into the path of a driverless car, the onboard AI mis-classified the impact as a minor side collision. Its flawed process model told it “nothing is trapped underneath,” so the controller executed its next rule-book action: pull over to the curb. That pull-over dragged the victim roughly 20 feet before the vehicle stopped. Sensors had failed to recognise a body beneath the chassis; the AI issued an unsafe control action; and the supervisory loop (remote support staff and automated monitors) never flagged a high-priority intervention. When Cruise then filed an incomplete report with regulators, it effectively severed the organisational feedback channel that should have driven more rapid corrective action. In STAMP terms, every layer that could have enforced the constraint “do not move if an object is under the vehicle” broke down, from perception to decision logic to external governance.
STPA (Systems-Theoretic Process Analysis) is the practical methodology that comes with STAMP, used to analyse systems for hazard scenarios. It forces you to map out the control loops in the system: identify each controller, the feedback and feedforward paths, the actuators, sensors, and the process being controlled. Then you systematically ask: “What unsafe control actions are possible here? How could each controller produce an unsafe action or fail to act when needed? What process model misunderstandings could lead to that? And what constraints or defenses are in place or could be added to prevent it?” This kind of analysis has been used in aviation, space, chemcical plants, automotive safety, etc., and I believe it is an excellent approach to apply to consequential AI systems. It naturally includes software, humans, and organisational factors in the analysis – not just physical component failures. It treats an AI algorithm or a human operator or a regulatory policy all as potential parts of the safety control structure.
One powerful concept from STPA is the idea of the process model in each controller. A “controller” in this context could be an AI module, a human operator, or an oversight committee, anyone or anything that issues control actions to influence the state of the system. Each controller makes decisions based on their process model, essentially its understanding or model of the current state of the system being controlled and sometimes the environment. In a car’s cruise control system, the controller is a piece of software whose process model might include “current speed,” “target speed,” and perhaps estimates like “road incline.” In a human driver, the process model is their mental picture of the car’s state and environment. Accidents often occur when the controller’s process model is incorrect or becomes outdated. For instance, if the cruise control’s model thinks the car is on level ground when it’s actually going downhill, it might apply the wrong throttle. Or if a human pilot of a plane has an outdated understanding of the autopilot’s mode (a well-known issue in some aviation incidents), they may fail to correct a dangerous situation. In AI systems, consider an automated content filter as a controller: its process model might be the set of rules or machine learning model that classifies “allowed vs disallowed” content. If the content on the platform shifts (say new slang emerges that the model doesn’t recognise as harmful), the model’s understanding (process model) is now flawed. This then leads to potentially unsafe actions (either letting through harmful content or over-blocking innocuous content, each with consequences).
This means if we want our AI systems to be safe, we have to ensure that all controllers in the loop – whether that’s the AI itself or a human supervisor – have an accurate and up-to-date process model. But by definition, the model of an AI system is adaptive because it is learning. That brings us to a specific challenge in AI safety: how do we design controllers (human or automated) that can adapt their process models as the AI and its context evolve?
That’s where I’ll pause for now. In the next article I’ll switch from making the case for safety engineering in AI to a more detailed discussion on how to think about adaptive controllers in AI systems, and the types of feedback and feedforward loops we can use.
If you have war-stories, counter-examples, or just a nagging question about the ideas I’ve sketched here, drop a comment or send me a note. Your feedback is really valuable to me as, I hope, to turn this series into a richer playbook for anyone who has to apply the mindset of a safety engineer to keep AI systems safe in the real world.
Thanks as always for reading.
https://www.csb.gov/assets/1/20/appendix_2_a__deepwater_horizon_blowout_preventer_failure_analysis1.pdf
https://arxiv.org/pdf/2410.22526v1
https://bair.berkeley.edu/blog/2024/02/18/compound-ai-systems/
https://direct.mit.edu/books/oa-monograph/2908/Engineering-a-Safer-WorldSystems-Thinking-Applied
https://mitpress.ublish.com/book/an-introduction-to-system-safety-engineering
Hello James, I really like the idea of using STPA to inform AI system design!
I can think of a couple of challenges: first, STPA is difficult to learn, and I haven't seen it adopted as a practice in software companies, with two exceptions - it was partially implemented at Akamai, and more recently was more fully adopted at Google. The engineers at Google have found STPA useful, and recently presented it at SREcon: https://www.usenix.org/conference/srecon25americas/presentation/klein, but there are few orgs outside Google with the resources and incentives to implement STPA.
Second, as the article "A manifesto for Reality-based Safety Science" points out, STAMP as an accident model is largely unchanged since its introduction in 2004 (https://doi.org/10.1016/j.ssci.2020.104654). I should point out that the goal of the paper is to call attention to a more widespread problem in safety science - a lack of empiricism.
Hi James, great article and good points. Minor, so please delete / dont show - but to help your "perfect" document - "That error set an cascade in motion that ultimately broke through every defence." Reload as "That error set a cascade ...". Kind regards Gavin