Choosing the right controls for AI risks
A systematic approach to selecting the most effective prevention, detection, and response controls for major AI risks - both human and technical and from design to operation.
In my previous three articles, I explored the nature of AI risks across technical, ethical, operational, and strategic domains. Then we examined some practical approaches to risk identification - from pre-mortem simulations to dependency chain analysis - and explored frameworks for assessment and prioritisation that factor in both the likelihood and unique velocity and feedback characteristics of AI risks.
Now comes the moment of truth: translating risk awareness into a concrete set of actions. I think most AI governance professionals would agree that this is the most challenging task in AI governance, demanding the most knowledge and skill. In this article, I'll guide you through that complex terrain of AI risk treatments, exploring how to select and implement controls that match the unique profile of each risk. I'll talk through when design-time preventive measures are most effective versus when run-time monitoring becomes essential. And I'll share some practical examples of controls specifically designed for common AI failure modes - from model drift to algorithmic bias, from adversarial attacks to feedback loops.
I provide a complete chart that you may find useful as a starting point when you face these same challenges. If you enjoyed the controls mega-map that I previously published, then this is the flip-side of that work, starting not from the regulatory control requirement, but from the specific risks you’ll likely encounter.
I hope you enjoy reading this as much as I enjoyed the work of putting it together.
From Risk Assessment to Risk Treatment
Before diving into specific risk treatments, it's important to understand the broader risk process going on here. After identifying and assessing risks, organisations typically follow an approach that looks something like the following, especially for those risks that are assessed to be beyond the organisation’s allowable risk tolerance:
Evaluate Existing Controls: Consider what controls are already in place - perhaps as part of broader compliance requirements or existing enterprise-wide controls. You probably already have baseline controls (like access management, model documentation, or basic monitoring) that address certain risks at least partially.
Analyse the gaps and select treatments: Identifying additional controls or actions that could further reduce the risk impact or likelihood (or indeed velocity or feedback). In this step, you figure out if and how to supplement existing safeguards. This is the main focus of this article.
Residual Risk Assessment: After applying controls, reassess the "residual risk" that remains. Is it now within your organisation's risk tolerance? If not, you may need additional measures or to reconsider the AI system's design or purpose entirely.
Implementation and Monitoring: Finally, implement the selected controls and continuously monitor their effectiveness, adapting as necessary.
The challenge with AI systems is that risks often evolve over time as data changes, models learn, and usage patterns shift. This means risk treatment isn't a one-time activity but an ongoing process of adaptation and refinement.
As a first step though, let’s quickly baseline on two important axes of our selection process: the lifecycle stage; and the control purpose.
Lifecycle Stage: Design-Time vs. Run-Time Controls
Design-Time Controls are implemented during system development, before deployment. These preventive measures are "baked in" to the AI system's architecture, data selection, and model training processes. They aim to eliminate or reduce risks from the outset.
Run-Time Controls operate while the system is in use. These include monitoring mechanisms, operational safeguards, and response procedures that detect and address issues as they emerge during the AI system's operation.
Most effective AI risk management requires a combination of both. Design-time controls establish a strong foundation, while run-time controls provide adaptability to changing conditions and emerging risks. The balance between them depends on the specific risk profile and the nature of the AI system.
Control Purpose: Prevention, Detection, and Response
Within both design-time and run-time categories, controls can serve different purposes:
Preventive Controls aim to stop risks from materialising in the first place. Examples include robust training data verification, adversarial training, and fairness-aware algorithm design.
Detective Controls monitor for signs that a risk is emerging or has occurred. These include drift detection algorithms, performance monitoring, and anomaly detection systems.
Response Controls activate when risks do materialise, minimising impact and restoring normal operation. They might include model rollbacks, human override procedures, or incident response protocols.
A comprehensive risk treatment approach incorporates controls of all three types, creating layers of defence against AI failures.
Choosing Risk Treatments for Specific AI Risks
I’ll use a series of eight illustrative AI risks to explain how we make these kinds of selection choices. Be aware, that although these are some of the most common risk types and relevant controls you’ll encounter, this is far from an exhaustive list. Each risk type is addressed with both design-time measures (to prevent or reduce the risk before the AI system is launched) and run-time measures (to monitor, detect, or respond to issues as the system operates).
Here you can download an illustration of these risks, mapped to candidate preventive, detective and response controls at both design-time and run-time.
Model Drift and Data Distribution Shift
One persistent AI risk is model drift, where an ML model’s performance degrades over time because the real-world data diverges from the training data. Also known as data distribution shift, this can lead to increasing error rates or unfair outcomes as the model encounters new patterns it wasn’t trained on. It’s critical in AI governance because even a well-trained model can become unreliable if it isn’t maintained.
Design-Time Controls: Before deploying an AI model, designers can build in resilience to drift. For example, robust training methods are used to expose the model to a wide range of scenarios (e.g. using diverse, up-to-date training data and including likely future variations). Data augmentation or simulation can help the model generalise beyond the original dataset. Teams should also perform stress tests by emulating potential distribution shifts in a staging environment – for instance, testing a vision model on slightly altered lighting conditions or a recommendation algorithm on emerging user trends. Additionally, you can establish a model update pipeline from the start: plan for periodic retraining or fine-tuning when new data arrives. This might include setting acceptable performance thresholds and creating a retraining dataset that will be used once the model sees enough new information. In essence, the design phase should assume that drift will happen and incorporate the tools to handle it (such as embedding drift detection components or alerts into the system architecture).
Run-Time Controls: Operationally, continuous monitoring is the most appropriate defence against model drift. Organisations can deploy drift detection techniques that watch data and outputs in real time. For example, statistical metrics like Population Stability Index or KL divergence can measure if the incoming live data distribution is shifting significantly from the training distribution. If those metrics cross a preset threshold, an alert can be triggered to indicate that the model may be out of sync with its environment. Likewise, monitoring model performance on a rolling basis (via a feedback loop of actual outcomes or human evaluation) can catch drift – e.g. an increase in prediction errors or user dissatisfaction might indicate the model no longer fits the data.
When drift is detected, the treatment is to intervene promptly: options include retraining the model with latest data, adjusting it (if it’s an online learning system), or in critical systems, rolling back to a simpler but more stable decision rule until the model is fixed. Some organisations implement automated re-training pipelines so that models get refreshed on new data regularly (for example, retrain every week on recent data). It’s also wise to have fallback plans – if the model’s confidence in predictions drops or it encounters data far outside its training scope, the system can escalate to human review or a rules-based decision to maintain safety. All up, combating drift involves monitoring for early warning signs and having mechanisms to quickly update or replace the model before severe degradation occurs.
Hallucinations in Generative Models
Generative AI models (like large language models and image generators) sometimes produce outputs that are plausible-sounding but incorrect or fabricated – commonly known as hallucinations. For example, a text model may confidently state a false “fact” or cite a non-existent source. Hallucinations undermine trust and can pose risks (from misinformation to bad business decisions) when AI content is used without verification. Reducing hallucination risk requires both designing the model to be more truthful and controlling its outputs during operation.
Design-Time Controls: Several strategies can be implemented at model development time to minimise hallucinations. One approach is fine-tuning on high-quality, domain-specific data. By training the model further on verified information in a particular domain, it aligns its knowledge with ground truth, making it less likely to generate unfounded content. Another powerful design strategy is Retrieval-Augmented Generation (RAG), which integrates an external knowledge base or database into the model’s generation process. Instead of relying purely on its internal parameters (which might be outdated or fictional), the model queries a source of truth (like a vetted document repository or the web) for facts, and then uses that to inform its answer. This effectively grounds the output in real data, greatly reducing hallucination in contexts like question-answering systems.
Incorporating techniques like Reinforcement Learning from Human Feedback (RLHF) can steer a generative model towards truthfulness. In RLHF, during training, the model is penalised when it produces incorrect statements and rewarded for correct or honest ones (based on human evaluators’ judgments), which teaches it to avoid certain types of mistakes. Other design-time measures include model architecture choices that enforce consistency (for instance, using ensemble models that must agree on an answer, or adding a module for confidence estimation – so the model can output an “I’m not sure” or abstain when it is likely guessing. Basically, at design time we’re trying to inject knowledge, feedback, and constraints into generative models to make hallucinations less frequent (although none of these techniques can completely eliminate it).
Run-Time Controls: No matter how well a model is trained, there remains a chance of hallucination at run-time, so active measures are needed while the AI is generating content. One important control is real-time fact-checking and filtering. For instance, after the model produces an output, the system can automatically cross-verify any factual claims against trusted sources (APIs, databases, search engines). If discrepancies are found – e.g. the model cites a statistic that isn’t confirmed – the system might correct the output or at least flag it for a human to review. In user-facing applications, it’s common to implement output filters: these could range from simple keyword-based checks (removing obviously false or harmful statements) to more complex secondary AI that evaluates the primary model’s output for truthfulness. Another technique is to use confidence thresholds.
If the model’s internal confidence in a particular piece of generated text is below a certain level (which often correlates with higher likelihood of hallucination), the system either refrains from including that text or attaches a disclaimer. Developers also allow for user feedback at run-time – e.g. providing a “report answer” or “was this correct?” prompt – to catch hallucinations that slip through and continually improve the system. In high-stakes uses (like medical or legal advice generation), a human-in-the-loop is essential: the AI’s suggestions are treated as drafts that a human expert must vet. Operationally, organisations might maintain a list of known “hallucination traps” (questions or prompts that often lead the model astray) and either pre-emptively handle those differently or warn users. The run-time philosophy is “trust, but verify” – assume the model might be making things up and put checks in place so that hallucinations are detected and stopped before causing harm.
Bias and Fairness Issues
AI systems can exhibit bias, leading to unfair or discriminatory outcomes – for example, a lending model that systematically underestimates creditworthiness for a certain demographic, or a face recognition system that works better on lighter skin tones than darker skin tones. These biases often originate from biases in training data or flawed model assumptions. The risk here is multi-faceted: ethical and legal repercussions, damaged reputation, and harm to affected groups. Addressing AI bias requires interventions both when building the system and during its operation.
Design-Time Controls: The design phase is the best place to prevent biases from taking root. One primary control is dataset curation – using diverse and representative training data. If certain groups or scenarios are underrepresented in the data, the model will likely perform poorly on them; thus, gathering additional data or re-balancing the dataset can mitigate bias. Teams should also apply bias mitigation algorithms during model training, such as re-weighting or re-sampling techniques that ensure the model doesn’t overfit to majority groups, or fairness-constrained optimisation that adjusts the model’s objective function to penalise disparate impacts. It’s important to define fairness metrics relevant to the context (e.g., equal false positive rates across demographic groups, statistical parity in outcomes, etc.) and evaluate the model against them in validation. If the model violates fairness criteria, iterative improvements are needed (like tweaking the model or features).
Another design-time practice is pre-deployment bias testing or audit: essentially a “fairness stress test” where the model is tested on synthetic or hold-out data for known bias patterns. Many organisations now have review checkpoints before an AI system is launched, where an interdisciplinary team examines it for potential bias issues. Finally, involving diverse perspectives in the design can help – e.g., having domain experts, ethicists, or representatives of impacted communities review the model’s behaviour. By catching and fixing bias early, you avoid costly revisions later.
Run-Time Controls: Even with careful design, continuous vigilance in operation is necessary because biases can emerge or re-emerge as data and usage evolve. One control is ongoing monitoring of outcomes by group. For instance, if an AI hiring tool is used, track metrics like selection rates by gender or ethnicity in real time. If they start to skew, that’s a signal to intervene. Some biases might only become apparent with large-scale use or due to feedback loops (e.g. if an algorithm downgrades certain applicants, those applicants might stop applying over time, further skewing the data). Regular audits of the AI system in production are thus essential – this could mean periodically retraining the model on recent data and checking if any new biases have crept in, or having an external or internal audit team simulate decisions for different subgroups to see if disparities exist.
Another operational treatment is providing explanations and recourse: for high-impact decisions (loans, job screening, etc.), affected users should be able to get an explanation of the AI’s decision and a way to contest or correct it. This doesn’t remove the bias per se, but it is a control that mitigates harm by involving humans in biased outcomes. In some systems, real-time bias mitigation can be applied – for example, a content recommendation system might enforce that a certain percentage of content shown is diverse to avoid “reinforcing a bias” that the user’s past behaviour might cause. Finally, keeping a human in the loop for oversight – not for every single decision, but through spot checks. An organisation might randomly sample decisions and have a committee review them for fairness. If they find issues (say a recruiting AI consistently scores certain universities lower without merit), they can adjust the model or its rules. In summary, run-time bias controls revolve around monitoring, transparency, and human oversight to catch unfair patterns early and correct course, as well as maintaining an organisational process (e.g. an AI ethics board) that continuously evaluates the AI’s impact on fairness.
Adversarial Inputs and Robustness Vulnerabilities
Adversarial risk refers to malicious attempts to fool or exploit AI systems. Examples include adversarial input attacks (tiny perturbations to an image that cause a classifier to mislabel it), input injections (feeding a toxic prompt to a chatbot to get it to produce disallowed content), or more subtle tactics like model evasion and data poisoning. Such vulnerabilities can be dangerous in security-critical AI (think of an attacker tricking an autonomous vehicle’s vision or bypassing an AI-based fraud detector). Mitigating adversarial risks is challenging because attackers are continually innovating, but there are well-established defensive techniques.
Design-Time Controls: To build robustness, developers can use adversarial training, which is essentially training the model on examples of attacks. By generating adversarial samples (e.g., images slightly altered to confuse the model) and including them in the training data, the model learns to resist those tricks. This has been shown to significantly improve a model’s resilience to known attack patterns. Another design measure is input preprocessing – designing the system to sanitise or normalise inputs in a way that removes potential attack noise. For instance, an image classifier might automatically crop or denoise images which can nullify some adversarial perturbations. Similarly, a language model interface might filter or encode user inputs to catch known exploits (like prompt injection patterns).
Security reviews should be part of the AI system design: perform red-team testing where experts role-play attackers and try to break the model before launch. Their findings (such as certain triggers that cause pathological behaviour) can inform added rules or model adjustments. Developers also often incorporate redundancy and validation at design time. If a single model is high risk, one might deploy an ensemble of models or an out-of-distribution detector alongside it. For example, a system could have a lightweight model that flags anomalous inputs (ones that don’t look like the training distribution) – if triggered, the system might reject the input or route it for further analysis rather than processing it blindly. In summary, designing for adversarial robustness means anticipating attacks and baking in defences: robust model training, rigorous testing, and fail-safes for unusual inputs.
Run-Time Controls: Even with robust design, new attacks can emerge, so run-time defences are crucial. One important control is adversarial input detection. The system should monitor inputs and model behaviour for signs of attack in real time. For instance, if a series of inputs seems intentionally crafted (e.g., a flood of inputs near a decision boundary or inputs that cause abnormal internal activations in the model), the system can raise an alarm. Techniques like monitoring the model’s confidence or using separate detectors (statistical tests on inputs) can help catch potential adversarial attempts.
Another control is to limit exposure: if the AI is offered as a service (say an API), implement rate limiting and throttling – this prevents attackers from probing the model with unlimited queries to find weaknesses. Similarly, output filtering can ensure that even if an adversarial input gets through, the output doesn’t cause harm (for example, a generative model might refuse to produce certain sensitive content even if prompted). Logging and monitoring are particularly important: maintain detailed logs of inputs and outputs; unusual spikes or patterns can indicate an ongoing attack. For critical systems, real-time monitoring by an operations team or automated system can enable a quick response – e.g., switch the model to a “safe mode” or offline mode if under attack. Additionally, patching and updates are part of run-time treatment: when new vulnerabilities are discovered (perhaps via academic research or incidents), quickly update the model or its filters. In fields like cybersecurity, there is a practice of continuous threat intelligence – the same applies here, where AI security teams should stay abreast of the latest adversarial techniques and ensure the system is updated to handle them. Ultimately, addressing adversarial risk is an ongoing process of “protect, detect, respond”: protect through robust design, detect attacks in operation, and respond by isolating or adapting the system to nullify the threat.
Harmful Content in AI Systems
AI systems — especially generative models — can produce or amplify harmful content, including hate speech, misinformation, self-harm encouragement, explicit material, or abusive language. This risk is particularly acute in open-ended, user-facing applications like chatbots, content creation tools, and recommendation systems. The consequences can be severe: legal violations (such as child protection laws or the EU AI Act), reputational damage, and real-world harm to individuals or communities. Crucially, this content does not have to be false to be dangerous — harmful outputs can be entirely factual yet still abusive, discriminatory, or inciting.
Effective treatment of harmful content risks involves both proactive design decisions and vigilant operational safeguards. Responsibility for harm prevention doesn’t lie solely with upstream model providers; any organisation deploying AI must implement governance, testing, and real-time controls to meet legal and ethical obligations for avoidance of harmful content.
Design-Time Controls: Mitigating harmful content begins with training and architectural decisions. Developers should start by carefully curating training datasets to exclude toxic, hateful, or abusive examples, especially when models are fine-tuned on custom corpora. Implementing content classification filters during training (e.g., blocking or down-weighting harmful examples) is a foundational step. Don’t assume that third party data providers have completely and thoroughly scrubbed the data they provide of harmful content.
More advanced strategies include integrating safety-focused reinforcement learning — for example, using human feedback to penalise outputs that are offensive, unethical, or incite harm. Architecturally, systems can incorporate safety layers such as moderation APIs, output classifiers, or toxic content scoring mechanisms into the generation pipeline. This allows the model to abstain, rephrase, or escalate outputs before they are shown to users. Developers can also configure thresholds for confidence, source verifiability, or output category to trigger warnings or human oversight.
The system design phase is generally the best time to implement policies for harm detection, disclosure, and escalation, including the creation of harmful content taxonomies. These definitions are critical for training classifiers and setting response protocols. Finally, regulatory alignment (e.g., with the EU AI Act, Digital Services Act, FTC rules, or Australia’s Online Safety Act) must be built into documentation, impact assessments, and deployment approval gates.
Run-Time Controls: Even with the best design practices, harmful content risks persist at run-time — particularly in interactive or generative contexts. Systems must be equipped with real-time monitoring and response capabilities. This includes automated content moderation filters (for toxicity, hate speech, explicitness, etc.), as well as dynamic tools like prompt injection guards or input sanitisation routines to block exploitative prompts.
When harmful outputs do appear, systems should trigger intervention protocols: this may mean filtering or rewriting content, issuing warnings, disabling parts of the system, or escalating to a human moderator. Many organisations also implement user feedback channels (e.g. “report this output”) which feed into ongoing improvement and alerting. For high-risk deployments (e.g., chatbots with minors, mental health tools, or open-ended search assistants), a human-in-the-loop is critical. This may involve real-time monitoring, escalation paths for flagged outputs, or mandatory human review before content is published or acted upon.
Robust logging and traceability is also essential: every prompt and output should be recorded so that harmful incidents can be investigated and remediated. This supports transparency, regulatory compliance, and systemic improvement. If harmful content is discovered after the fact, notice-and-takedown procedures should allow for swift mitigation and user protection.
So preventing harmful content in AI systems needs this layered approach: eliminate as much as possible at design time, and control, detect, and respond rapidly at run-time. If you want to read more about this specific risk, I previously wrote a few articles based on work I was doing on this:
Loss of Personal or Confidential Information
AI systems — particularly those based on large-scale data ingestion or open-ended interaction — carry a significant risk of exposing private, sensitive, or confidential information. This can happen in various ways: through inadvertent memorisation of personal data during training, through over-sharing in generative responses (e.g. chatbots disclosing internal knowledge), or via insufficient access controls in deployed environments.
These risks can lead to violation of legal obligations under data protection laws (such as GDPR, HIPAA, or CCPA), harm individuals whose data is disclosed, and undermine trust in AI technologies. Importantly, privacy loss is not always the result of malicious action — accidental design oversights, unclear boundaries between internal and external data, or unexpected emergent behaviour in models can all lead to leakage. A privacy-aware AI governance strategy must therefore address this risk comprehensively — from what data is collected and how it's used, to what outputs the system is allowed to generate, to how breaches are detected and handled. Strong protections require both pre-emptive design choices and robust operational controls.
Design-Time Controls: Privacy protection begins with how training data is selected, processed, and documented. A key principle is data minimisation: only include data necessary for the model’s intended function, and avoid ingesting sensitive or personally identifiable information (PII) unless it is essential and legally justified. This requires careful dataset audits, documentation of provenance, and formal processes for data redaction or anonymisation.
Developers should also implement differential privacy or similar techniques that limit the model’s ability to memorise individual examples. Where appropriate, models should be trained with privacy-preserving algorithms that introduce randomness or limit the influence of any single datapoint. For generative models, it’s vital to test for memorisation and leakage. This can include red-teaming for model recall of training inputs (e.g., asking a chatbot to repeat email addresses or phone numbers), as well as quantitative tests for exposure of rare strings. If memorisation is detected, retraining with filtered or masked data may be necessary.
From a governance perspective, design-time practices should include clear data handling policies, privacy impact assessments, and the implementation of technical and organisational safeguards required by applicable privacy laws. Teams should also design with access control in mind — defining which users or components can interact with sensitive outputs or training data.
Run-Time Controls: Even with privacy-conscious design, operational controls are essential. One key protection is real-time output filtering, which detects and blocks potential disclosures — for example, models generating what appears to be personal data (names, contact info, internal codes). This often relies on Named Entity Recognition (NER), PII detection tools, or custom content filters tailored to an organisation’s risk profile.
Systems should implement role-based access controls and logging, ensuring only authorised users can access sensitive model functions or datasets. For API-based models, rate limits, query validation, and input sanitisation help prevent misuse or probing attacks. For end-user interfaces, it is wise to offer disclaimers and warnings (e.g., “do not share sensitive personal data here”) and to prevent the model from storing user inputs unless explicitly consented to. One useful approach is to implement ephemeral sessions, ensuring that prompts and responses are not retained unless necessary.
Run-time privacy incidents should trigger escalation protocols, including content takedown, user notification (if required), and internal investigation. Systems should be regularly audited for leakage, and logs analysed for suspicious access patterns. In short, effective privacy protection needs a combination of proactive restraint and monitoring — designing models to respect privacy boundaries and equipping operations teams to enforce and monitor those boundaries consistently.
Feedback Loops and Behaviour Amplification
Certain AI systems, especially those interacting with user behaviour (social media recommenders, personalised feeds, etc.), face the risk of feedback loops. This is when the AI’s output influences user behaviour or the environment, which in turn provides input back to the AI, potentially creating a self-reinforcing cycle. A classic example is a recommendation algorithm that learns a user likes sensational content; it shows more sensational content, which increases the user’s preference for it, leading the algorithm to go even further down that rabbit hole. Over time this can amplify biases or undesirable behaviours (e.g. contributing to echo chambers, radicalisation, or simply a narrowing of content exposure). In an enterprise context, a feedback loop risk might be an automated trading AI that influences market conditions through its actions, then reacts to those very changes, possibly causing instability.
Design-Time Controls: To mitigate feedback loops, the system should be designed with balance and diversity in mind. One approach is to explicitly include diversity-promoting mechanisms in algorithms. For instance, a recommender system can be programmed to occasionally explore – showing content outside the user’s immediate preference profile (the “explore new genres” feature). This prevents the spiral of only reinforcing known preferences. Another design measure is setting constraints on optimisation: instead of optimising solely for engagement (which often drives amplification of whatever gets engagement), use multi-objective optimisation that includes metrics for diversity, long-term user satisfaction, or well-being. In other words, bake into the design a goal that discourages the AI from focusing narrowly on one reinforced signal.
Conducting simulation studies during design can also reveal potential feedback problems – for example, simulate how the recommendations would evolve over many iterations with virtual users to see if the system converges to a degenerate state. If it does, adjust the algorithm (perhaps adding decay factors or resetting aspects of the model periodically). Another treatment at design is user control features: give users ways to reset or broaden their profile (like “refresh recommendations” or easily start seeing different content categories), which can counteract algorithmic echo chambers. Essentially, design-time efforts should make the AI aware of the risk of self-reinforcement and include features that introduce randomness, diversity, and user choice to break potential loops. Finally, incorporate policy constraints – for social platforms, this might be rules like “don’t recommend content from the same creator more than X times in a row” or “if user has seen very similar content 5 times, inject something different.” These rules, set during design, ensure a baseline of variety.
Run-Time Controls: Once the AI system is live, monitoring for feedback loop effects is key. Platforms should track metrics like content diversity over time, concentration of recommendations, or user pathologies (e.g., sudden shifts in user behaviour that might indicate the algorithm pushed them a certain way). If signals of an unhealthy feedback loop appear – say a sharp polarisation in the type of content consumed – operators might intervene by adjusting the algorithm’s parameters or manually injecting diverse content. Regular auditing of the AI’s impact on user populations is an important control: for example, an audit might reveal that a music recommendation service is only ever reinforcing one genre to some users. Armed with that knowledge, the team can deploy updates (like a new model version that has a higher exploration rate). User feedback mechanisms also help break loops: letting users easily indicate “I’m getting tired of this kind of content” or “show me something new” provides a release valve.
On the system side, periodic resets or perturbations can be scheduled – some recommenders periodically randomise a small fraction of recommendations to prevent lock-in. In sensitive cases like finance or safety, having a human supervisor in the loop can catch runaway behaviour: e.g., a human can notice if an AI trading bot is engaging in a loop with another bot and shut it down. It’s also important operationally to update models with fresh data that incorporate any exogenous changes. If the only feedback the model gets is its own influenced data, it can drift; injecting external data or periodically retraining from scratch can reduce that self-reinforcement. In sum, run-time treatments for feedback loops involve active monitoring and course-correction – ensuring the AI’s outputs do not spiral unchecked. Some organisations explicitly review their AI systems for “feedback loop bias” and use metrics or tools to maintain healthy system behaviour. By diversifying inputs and having humans oversee the outcomes, one can dampen harmful amplification effects.
Overreliance on Automation / Erosion of Human Oversight
As AI systems become more capable, there is a risk that humans trust them too much – a phenomenon often called automation bias. Overreliance on AI can lead to situations where humans become passive, failing to catch AI errors or blindly following AI recommendations even when common sense would disagree. In critical domains (healthcare, aviation, legal, etc.), this erosion of human oversight can be dangerous. For example, a doctor might accept an AI diagnostic suggestion without double-checking symptoms, or a pilot might hesitate to override an autopilot even when it’s misreading the situation. The goal of risk treatment here is to keep the human engaged and informed, using AI as a tool with human judgment, not a replacement for it.
Design-Time Controls: The interface and role of the AI should be designed explicitly to support human decision-makers, not displace them inappropriately. One approach is to build in transparency and explainability – ensure the AI can show why it made a recommendation (e.g. highlighting the factors or providing a rationale). This helps users evaluate the AI’s reasoning rather than taking it at face value. Another is designing cognitive forcing functions or checkpoints in the workflow. For instance, require the user to input a brief justification whenever they accept an AI’s high-stakes recommendation (“Tell us why you agree with the AI’s assessment”), which forces a moment of reflection. In software, this might be a simple dialog box or an approval step. Similarly, for certain critical suggestions, the system could deliberately present a second opinion or devil’s advocate view (“Another model/policy thinks differently”), prompting the human to reconcile the difference. It’s also crucial to educate and train users during the design rollout. As part of onboarding, show them examples of the AI being wrong or uncertain, so they learn its limitations and don’t develop a false sense of infallibility. In terms of system capabilities, the AI could be designed to defer to human when unsure – for example, an autonomous vehicle might hand control back to the driver in ambiguous situations. By designing AI that is collaborative (augments human decision-making with suggestions and insights) rather than fully autonomous in all cases, we preserve a meaningful role for humans. Finally, set clear policies that some decisions must have human sign-off. For example, a bank might design a rule that “AI can flag transactions, but a human analyst must approve before account closure,” thereby institutionalising human oversight at design time.
Run-Time Controls: In operation, preventing overreliance involves both system features and organisational practices. One effective control is real-time user feedback or warnings. If the system detects that a user is accepting all AI recommendations without question (maybe a pattern of always clicking “OK” immediately), it could prompt: “Reminder: please review the AI’s suggestion carefully.” This kind of nudge can break complacency. Systems can also monitor themselves for confidence and flag low-confidence decisions for mandatory human review, ensuring that when the AI is on shaky ground, the human is alerted (“The AI is uncertain about this case, please double-check”). An example from aviation: modern autopilots will sound an alert if they encounter conditions outside their normal operating envelope, effectively telling the pilot “it’s your turn!”.
In a corporate setting, periodic drills or audits help maintain oversight. For instance, supervisors might randomly inspect some percentage of AI-made decisions to verify if they were correct and see if the human operator appropriately vetted them. If they find rubber-stamping, that triggers retraining of staff. Another strategy is limiting continuous automated operation – e.g., requiring human intervention or at least a pause after a certain number of automated actions. This is akin to ensuring pilots manually fly occasionally to retain their skills. Culturally, organisations should encourage a mindset of “AI is fallible”. This can be done by regularly sharing examples of AI errors that were caught by diligent employees, reinforcing that oversight matters. On the flip side, adjusting incentives might be necessary: if employees are pressured to handle cases quickly, they may lean too much on AI for speed; ensuring quality metrics are valued will motivate them to take the extra step of verification. In sum, run-time treatments for overreliance involve actively engaging the human operators – through alerts, through workflow requirements, and through a culture of accountability. The human-AI team works best when the human understands they are the final check, and the AI understands when to ask for help. By balancing trust and scepticism in daily operations, organisations get the best of both: AI efficiency and human judgment.
Closing Thoughts on Control Selection
In conclusion, managing AI risks is about selecting the right controls for the right risk. A risk like model drift is best met with robust monitoring and retraining processes, whereas a risk like AI bias demands fairness-by-design and ongoing audit. No single mitigation or framework covers all scenarios – a safety feature that works for one risk (e.g. adversarial training for attacks) won’t address another (e.g. feedback loops). Thus, practitioners must take a contextual approach: understand the failure modes of their specific AI system and tailor a portfolio of treatments. By applying design-time safeguards to prevent issues and run-time measures to detect and respond to issues, you can substantively reduce each AI risk to an acceptable level. The emphasis is on being proactive and specific – identify the particular challenges (drift, hallucination, bias, etc.) and deploy targeted controls that directly address those challenges, rather than relying on generic AI governance platitudes. This way, AI systems can be made safer, more reliable, and more aligned with our goals throughout their life cycle.
One crucial insight that emerged clearly for me in the process of this writing was a more precise awareness that AI risk management isn't primarily a technical endeavour. While algorithms and automated safeguards play important roles, a significant portion of these controls ultimately rely on human action and intervention. From fairness audits to adversarial testing, from content moderation to oversight of automated decisions, people remain at the centre of responsible AI governance. This human element isn't incidental—it's essential. If you examine your implementation plan and find it dominated by purely technical controls with minimal human involvement, that's a red flag. The most robust risk management approaches blend technological safeguards with human judgment, expertise, and oversight. Technical controls can detect when something might be going wrong, but human judgment is often needed to understand the deeper implications and make contextual decisions about appropriate responses. This balance between automated protection and human wisdom creates a more resilient system than either could provide alone.
This article has been one of the most challenging for me to research and write in the entire series. Piecing together what I believe are the most effective controls for each risk type has drawn a lot from both my personal experience implementing AI governance and my research over the past year into emerging best practices. The process of distilling the vast set of complex technical and operational safeguards into practical, actionable guidance was a balancing act— trying to be specific enough to be useful, yet adaptable enough for diverse contexts.
I've intentionally focused on the most common and consequential AI risks, knowing full well there are countless more risks, control variations and nuanced approaches I couldn't include while keeping this guide practical. Each AI system you work with will have its unique risk profile that may require tailored controls beyond what I've outlined here. Our field is also evolving rapidly, with new defence mechanisms emerging alongside novel risks, it could take many articles to cover all of these approaches (and perhaps they will be the subject of some articles in the future).
Despite the challenge, assembling this comprehensive framework has been immensely rewarding. There's something deeply satisfying about mapping the contours of effective risk treatment—showing how thoughtful design-time decisions can prevent issues before they arise, and how vigilant run-time monitoring can catch what inevitably slips through. My hope is that you'll find these control mappings not just informative but immediately applicable to your own AI governance work.
Now I promised a four-part series on AI risk management, but I find I need one more article to properly complete the journey. In the coming days, I'll share the final piece: a complete AI risk management policy template that you can adapt and implement within your organisation's governance framework. This template will synthesise all the concepts we've explored—from risk identification to assessment to treatment—into a cohesive document ready for practical application.
Stay tuned for that final instalment and thank you as always for joining.
This is a really useful article that demonstrates the nuances of being able to map tactical actions for AI based system from the higher strategic policy level guidance. This will always be where the rubber meets the road.
You can have all the guidance and policies in place that were ever written, but if you can't implement code level, or configuration level features to support those policies, they don't mean anything.
Policy alone won't get you there. You must be able to map policy requirements to tactical implementable controls at the bit level.
Rocky
James:
Your insights and impressions are immensely helpful and have spawned a great deal of thought concerning the pragmatic side of this challenge. I really appreciate your willingness not only to tackle the hard issues, but to do it so responsibly. Thank you for your efforts and guidance.
Dennis