What's in a real-world AI Managment System Part 2: Risk, Processes and Documentation
In this part, we build on the governance framework with effective risk management, streamlined operational processes and pragmatic documentation.
In my previous article1, I introduced the four essential components of an AI Management System - the governance structure that forms its decision-making backbone, the risk management framework that serves as an early warning system, the operational processes that guide day-to-day activities, and the documentation system that maintains institutional memory. I explored how these elements work together to enable responsible innovation at scale, and I dove a little deeper into what makes an effective governance framework.
Now, I’ll go through the remaining three components at a high level, starting with how to build a risk management framework that goes beyond traditional IT approaches to address risks to AI systems.
The Risk Management Framework
So, let’s get into what a robust risk management framework for AI really looks like in practice. This isn't about creating complex risk matrices that nobody uses - it's about practical tools that help you identify and address problems before they become serious issues.
First, you need a solid taxonomy of AI-specific risks - a shared language for talking about different types of problems. As I described in the previous article, I break these into four key categories: technical risks (like model drift or data quality issues), ethical risks (like bias or fairness concerns), operational risks (like deployment failures or monitoring gaps), and strategic risks (like regulatory non-compliance or reputational damage). For each category, you need clear indicators that help teams identify when risks are emerging. There are some useful libraries of risk that can get you started on categorisation, like the MITRE Atlas2 or the vast MIT AI Risk Repository3 but be careful how you use them. I say this because they are extremely comprehensive and can be overwhelming - the MIT AI Risk Repository for instance catalogues over a thousand risks, and enumerating all of those for your organisation would swamp you and your stakeholders in analysis paralysis.
So, start with risk categories and build from there. For each category, think through specific thresholds that you expect should trigger action. For instance, what level of accuracy drop requires immediate investigation? What degree of demographic disparity in model outputs indicates potential bias, and is there a level at which that would become a significant issue? When do performance variations suggest your model is operating outside its designed parameters? How much downtime for your system would be acceptable if it had to be taken offline? These kinds of questions will help you figure out what your organisation’s tolerance for risk looks like in concrete terms.
Your risk assessment methodology needs to guide teams through identifying and evaluating different types of AI systems consistently. You may find your organisation already has some form of risk management methodology that is worthwhile to review and align with, or you could look at some of the common frameworks. ISO310004, for instance, provides a good foundation with its emphasis on integrating risk management into organisational processes and decision-making. For AI-specific considerations, NIST's AI Risk Management Framework (AI RMF)5 offers detailed guidance on managing risks across the AI lifecycle. Some organisations might already use the Common Vulnerability Scoring System (CVSS) framework6 for cybersecurity, and for them it may make sense to quantify AI risks in a similar way. You might want to look at some industry practices such as Microsoft's Responsible AI Standard7 for comparison. The key is not to adopt these frameworks wholesale, but to understand their principles and adapt them to your organisation's specific AI use cases and risk appetite. I've seen teams achieve the best results when they start with a recognised framework as a foundation, then iteratively refine it based on their practical experience and lessons learnt.
However, in that process, it’s important to keep in mind how different the nature of AI risks are to conventional technology risks. Traditional risk management frameworks serve us well for conventional systems where causes and effects are clearly linked - like predicting server failures or estimating the impact of data breaches. But AI systems introduce a fundamentally different dimension of uncertainty. They can evolve in unexpected ways, develop subtle biases, or make decisions we didn't anticipate - even while operating exactly as designed.
What makes AI risk management uniquely challenging is that risks cascade fast and amplify through feedback. A technical issue can trigger ethical concerns, which may create operational problems that ultimately pose strategic risks. This interconnected nature demands a more sophisticated approach to risk assessment that goes beyond traditional likelihood-impact matrices.
To capture these dynamics, organisations need to consider two additional factors beyond just likelihood and impact: impact velocity (how quickly a risk can escalate from minor concern to major crisis) and feedback potential (the presence of self-reinforcing loops that can amplify problems). Together, these create what we might call an "AI risk amplification factor" that helps organisations understand not just how severe a risk might be, but how quickly it could spiral out of control. In a subsequent article, I’ll detail the methodology for calculating this amplification factor and applying it to risk prioritisation.
The flip-side of your risk framework should be your control mechanisms - but here again, AI systems demand quite a different approach than traditional IT controls. While IT typically focuses on adding protective layers around existing systems, AI controls often require us to reshape the systems themselves. Think of it this way: if a traditional IT system has a vulnerability, we can add security gateways and perimeter controls after it's built, an AI system is more like a building where safety features must be architected into its very foundation.
This means our controls extend deep into the design phase. When we discover bias in our outputs, the solution often isn't a simple monitoring overlay - more likely it requires reengineering our training data collection, reformulating our model architecture, or fundamentally rethinking our approach to feature selection. If we find our model making dangerous errors in edge cases, we can't just add a warning system - we might need to expand our training data to better cover these scenarios or redesign our model's architecture to handle uncertainty more gracefully.
That said, we still need our three traditional categories of controls, but reimagined for AI's unique challenges:
🔐Preventive controls now start much earlier in the lifecycle - they include careful data curation, thoughtful model architecture decisions, and robust testing regimes that probe for potential failures before they occur. This might mean investing months in improving training data quality or completely redesigning model architectures to better handle edge cases.
⚠️Detective controls still monitor live systems, but they need to watch for AI-specific issues like concept drift, data distribution shifts, and emerging biases - subtle problems that can develop gradually over time. These controls need to be sophisticated enough to spot patterns that might indicate the model is starting to operate outside its designed parameters.
🛠️Corrective controls must be more nuanced than simple rollbacks (if there ever was such a thing as a simple rollback). Sometimes fixing an AI system means retraining on better data, adjusting model architectures, or even reconsidering fundamental design choices about how the system makes decisions. Teams need clear procedures not just for immediate responses, but for these deeper corrective actions that may take weeks or months to implement properly.
This more comprehensive view of controls requires closer collaboration between risk managers and AI development teams. It's not enough to have risk specialists designing controls - they need to work hand in hand with the scientists and engineers building and training the models to ensure safety and reliability are built in from the start.
What you need to do:
➡️Build your risk assessment framework:
Create templates for assessing different types of AI systems
Define clear criteria for risk levels with specific examples
Establish assessment frequency requirements for different risk categories
Document exactly what evidence is required for risk sign-off
Set up regular review cycles for updating risk criteria
➡️Implement your control mechanisms:
Deploy automated monitoring tools for model performance, data quality, and bias detection
Establish clear thresholds for different types of alerts
Create response procedures for different types of issues
Set up regular testing cycles for control effectiveness
Document all control activities and results
➡️Create your risk reporting structure:
Define what metrics need to be tracked and reported
Establish reporting frequencies for different risk levels
Create dashboards that provide clear visibility into risk status
Set up escalation triggers for risk threshold breaches
Implement trend analysis to spot emerging patterns
Remember, the goal isn't to create a perfect risk management system on day one. Start with your most important AI systems and build out from there. Focus on establishing clear, actionable processes that teams can actually follow, and refine them based on experience. Most importantly, ensure your risk framework helps teams make better decisions about AI development and deployment - it should be a practical tool, not just a compliance exercise.
Operational Processes
Let me walk you through how operational processes work in practice - the day-to-day machinery that turns governance principles and risk frameworks into real actions. This is where theory meets reality, and in my experience, it's often where AI management systems either succeed or fail.
Think about what happens when a data scientist wants to retrain a model with new data, or when an operations team needs to investigate unusual behaviour in a deployed system. Without clear operational processes, these everyday activities can become bottlenecks that slow down development or, worse, create opportunities for serious mistakes. That's why getting these processes right is crucial.
The foundation of your operational processes should be a clear system lifecycle. This means mapping out exactly what needs to happen at each stage of an AI system's life - from initial development through deployment and eventually retirement. For development, you need specific requirements for data validation, model testing, and performance verification. During deployment, you need clear processes for monitoring, updating, and maintaining systems. For retirement, you need procedures for safely decommissioning models and preserving necessary records.
One of the most critical operational elements is your change management process. When teams need to update models, modify data pipelines, or adjust monitoring thresholds, they need clear procedures to follow. I've found that effective change management for AI systems requires three key elements: a way to assess the impact of proposed changes, a process for testing and validating changes before they go live, and clear criteria for what level of oversight different types of changes require.
Many AI teams stumble in one particular area of operational process - they've mastered code versioning but struggle with the more complex world of model and data versioning. It's one thing to track changes in your codebase using tools like Git, but quite another to maintain clear lineage when your system's behaviour is shaped by the interplay of code, model weights, and training data. I've seen sophisticated engineering teams confidently manage their application code while losing track of which version of their model was trained on which dataset, or why certain hyperparameters were chosen for a particular training run. DVC (Data Version Control)8 extends Git-like version control to data and models, making it easier to track large files and complex dependencies. At scale, there are commercial tools that might make sense for tracking experiments and the relationships between data, model architectures, and training runs. But at the start, it’s just about treating your models and datasets with the same rigorous version control discipline you apply to code - each training dataset should be immutable and versioned, each model artifact should be tagged with its training data lineage, and each deployment should have a clear record of which versions of code, model, and data are working together to produce its outputs.
Your incident management process needs to be robust. When something goes wrong with an AI system (and it will go wrong!) - whether it's degraded performance, unexpected outputs, or potential bias - teams need to know exactly what steps to take. This means having procedures for initial response, investigation, resolution, and follow-up. Each type of incident should have specific response requirements and escalation paths.
Performance monitoring is another crucial operational area. You need to decide what metrics to track, how frequently to check them, and what thresholds should trigger action. This isn't just about technical metrics like accuracy or response time - it includes monitoring for bias, fairness, and other considerations. Teams need to know exactly what to look for and what to do when they spot potential issues.
What You Will Need to Do:
➡️ Establish your development and deployment workflows:
Create standardized processes for data preparation and validation
Define specific testing requirements for different types of models
Document deployment checklists and verification steps
Set up quality control gates at key stages of development
Build clear handoff procedures between teams
➡️ Implement your change management system:
Define categories of changes and their required approvals
Create testing requirements for different types of changes
Establish rollback procedures for when changes cause problems
Set up change tracking and documentation requirements
Build verification processes for change implementation
➡️ Design your monitoring framework:
Identify key metrics for different types of AI systems
Set up automated monitoring tools and dashboards
Define alert thresholds and response requirements
Create regular reporting templates and schedules
Establish trend analysis procedures
➡️ Build your incident management process:
Create incident classification guidelines
Define response procedures for different incident types
Establish investigation and root cause analysis requirements
Set up incident tracking and reporting mechanisms
Build post-incident review procedures
Remember, these processes need to be living documents that evolve as you learn what works and what doesn't. Start with your most critical operations, and simple processes. Build out from there. Make sure the workflows are clear enough that new team members can follow them but flexible enough to handle unexpected situations.
I've found that the most successful implementations are those that make these processes feel natural and useful rather than bureaucratic. They should help teams work more efficiently by providing clear guidance and removing uncertainty about what needs to be done. When done right, good operational processes actually speed up development by eliminating confusion and reducing rework.
Facing up to Documentation
Ok, documentation is the least exciting part of AI governance, it’s true, but I've learned through hard experience that it's often what separates organisations that can scale their AI initiatives successfully from those that become mired in confusion and repeated mistakes.
Think about what happens six months after deploying an AI system when someone asks: "Why did we choose this particular approach for handling edge cases?" or "What evidence did we consider when setting these fairness thresholds?" Without proper documentation, these questions send teams scrambling through old emails and meeting notes, often ending in uncertainty. But with a well-designed documentation system, you can quickly trace the logic behind key decisions and apply those insights to new challenges.
The heart of your documentation system should be what I call "decision records" - clear narratives that capture not just what was decided but why. These aren't just technical specifications; they're stories that explain the context, alternatives considered, and reasoning behind important choices. When documenting model development decisions, for instance, you need to capture why certain architectural choices were made, what experiments led to key parameters, and how you arrived at specific performance thresholds.
Your system also needs to maintain what I think of as a continuous thread of evidence. This means documenting the entire lifecycle of your AI systems in a way that lets you reconstruct what happened at any point. You should be able to trace how data was collected and processed, how models were trained and validated, and how performance has evolved over time. This isn't about creating enormous documents - it's about maintaining clear, searchable records that help you understand your systems' behaviour and evolution.
Incident documentation requires particular attention. When something goes wrong with an AI system, you need more than just a record of what happened. You need to capture the investigation process, the root causes identified, and most importantly, the lessons learned. I've seen teams repeatedly encounter the same issues because they didn't properly document and share insights from past incidents.
There's a particularly thorny problem I've encountered repeatedly in AI system documentation, one that stems from the extraordinary pace of AI innovation itself: zombie documentation. Picture this: your team is responding to a critical incident at 2 AM, and you turn to your system documentation for guidance. You find what appears to be a detailed architectural document, complete with data flows and model specifications. Excellent - but as you dig deeper, your heart sinks - this documentation, so carefully crafted a year ago, now describes a system that no longer exists. The rapid iterations of model improvements, the countless small adjustments to data pipelines, the accumulated tweaks to hyperparameters - none of these are reflected in what you're reading. This zombie documentation, neither fully dead nor truly alive, is often more dangerous than having no documentation at all. It creates a false sense of security, leading teams down blind alleys during critical moments when clarity is most needed. Skilled engineers can waste precious hours trying to debug issues based on outdated assumptions, only to discover that the actual system had evolved far beyond its documented design. The solution isn't just to write better documentation - it's to make documentation maintenance an integral part of your change management process. Every significant model update, data pipeline modification, or architectural shift must trigger a corresponding documentation review and update. Your documentation has to evolve alongside your AI systems, not a historical artifact to be preserved in amber.
What You Will Need to Do:
➡️ Create your documentation architecture:
Design templates for key document types (decision records, specifications, incident reports)
Establish a clear filing system that makes documents easy to find
Define retention requirements for different types of documentation
Set up version control for living documents
Create mechanisms for linking related documents together
➡️ Build your evidence trail:
Define what needs to be documented at each stage of AI development
Create templates for recording model training and validation results
Establish procedures for documenting data lineage and transformations
Set up systems for tracking model performance over time
Build mechanisms for documenting system changes and updates
➡️ Implement your decision documentation process:
Create templates for capturing decision context and rationale
Define requirements for documenting alternatives considered
Establish procedures for recording stakeholder input
Set up systems for tracking decision implementation
Build mechanisms for reviewing and updating past decisions
The key is making documentation feel like a natural part of the work rather than an administrative burden. Your documentation system should help teams work more effectively by making it easy to find information when they need it. It should capture knowledge in a way that helps your organisation learn and improve over time. It is worth every effort to make documentation systems that focus on practical utility rather than procedural compliance. Make it easy for teams to record important information in the flow of their work and even easier to find that information later when it's needed.
Remember, you don't need to document everything - focus on capturing the information that will be most valuable for future understanding and decision-making. Start with documenting your most critical systems and decisions, build out from there as you learn what documentation proves most useful in practice.
One last point: Don’t choose tools too early!
Let me add an important note about tools and automation in your AI Management System. Think about tools as amplifiers of good processes, not replacements for them. When you have clear governance mechanisms and workflows established, the right tools can make them more efficient and reliable. But I've watched organisations jump into purchasing expensive governance platforms before they really understood their own needs, only to find themselves adapting their processes to fit the tool rather than the other way around.
The key is to start simple and grow your tooling organically. In the early stages of your AI Management System, basic tools like shared documentation repositories, version control systems, and simple monitoring dashboards might be all you need. These let you establish and refine your core processes without getting locked into specific workflows or complex integrations.
Once your processes are stable and you understand where automation would add the most value, then you can start expanding your toolset strategically. For instance, you might add automated monitoring tools that alert teams when model performance drifts beyond acceptable thresholds, or documentation systems that automatically capture model training parameters and data lineage. But each tool should solve a specific, well-understood need rather than promising a complete governance solution.
I've found that the most successful organisations build their tooling in layers. They might start with basic collaboration tools and version control, then add monitoring dashboards, then gradually introduce more sophisticated tools for things like automated bias detection or drift monitoring. At each stage, they ensure their processes drive their tool selection, not the other way around.
Be particularly careful with all-in-one governance platforms that promise to solve everything at once. While these can maybe be valuable for mature organisations with well-established processes (although personally, I’ve yet to experience one), they can become restrictive straightjackets for teams still figuring out their governance needs. If they fit well, they can accelerate your program, but teams can waste months trying to force their workflows into rigid platform structures when they would have been better served by simpler, more flexible tools in Excel.
Remember, the goal of automation isn't to remove human judgment but to free up human attention for where it's most valuable. Good tools should help teams focus on meaningful decisions by handling routine tasks and providing clear signals when something needs attention. They should make it easier to follow good practices, not force compliance through rigid controls. The right time to invest in more sophisticated tools is when you find yourself repeatedly saying "I wish we had a better way to..." That's when you know you understand your needs well enough to choose tools that will truly help rather than hinder. Until then, focus on getting your processes right with simple, flexible tools that won't lock you into particular ways of working.
Throughout this series of articles, we've explored the foundations of effective AI governance - from making the business case and securing leadership commitment to building out the four pillars of an AI Management System: governance, risk management, operational processes, and documentation. We've mapped the 'why' and 'what' of AI governance. Now it's time to turn to the 'how.'
Our next article begins with a fundamental principle I've learned implementing these systems: you can't govern what you don't see. We'll explore the critical first step in building an effective AIMS - creating your AI System Inventory, the map of your organisations AI landscape. Knowing what you have is the first step to governing it well. I hope you join me by subscribing.
https://www.ethos-ai.org/p/what-is-in-a-real-world-ai-management
https://atlas.mitre.org/
https://airisk.mit.edu/
https://www.iso.org/standard/65694.html
https://www.nist.gov/itl/ai-risk-management-framework
https://www.first.org/cvss/
https://query.prod.cms.rt.microsoft.com/cms/api/am/binary/RE4ZPmV
https://dvc.org/
Thank you. Well written, on point, great read.
On choosing a tool, I prefer a different approach, i.e. select an integrated platform as early as possible. The layered approach - add tools as & when needed - while coming naturally, in my experience leads to fragmented architectures. One team might choose to add tool X (say W&B for experimenting), while another team chooses tool Y (say MLflow). I have even seen the same teams using different tools for the same purpose, but in different projects.
This approach, while seemingly efficient and flexible, leads to an overall system architecture that is ineffective, hard to manage and almost impossible to consolidate. Ultimately it ends up binding a lot of engineering capacity.
Disclaimer: I may be biased because I market an integrated platform that I have developed to solve this very problem.