AI + Leadership
    10 min read23 March 2026

    Situational Judgement and Exception Handling in Human-AI Teams

    In any system where AI operates with autonomy, the most consequential human contribution is knowing when to intervene. The Exception Triage Framework and pre-mortem thinking give orchestrators a structured approach.

    Gemma Torregrosa

    Growth Performance

    In any system where AI agents operate with a degree of autonomy, the most consequential human contribution is knowing when to intervene. Klein's (1998) research on naturalistic decision-making demonstrates that experienced professionals in high-stakes environments do not make decisions by exhaustively analysing all available options. They recognise patterns, match the current situation to their repertoire of experience, and act on the first option that appears workable. This recognition-primed decision model describes precisely the cognitive skill that effective AI orchestrators need: the ability to rapidly assess whether a situation is within normal parameters or requires human override.

    The challenge in human-AI collaboration is that the "normal parameters" are themselves shifting. As AI systems become more capable, the boundary between safe delegation and required intervention moves. Agrawal, Gans and Goldfarb (2022) argue that organisations must continuously recalibrate their decision rights as AI capability evolves. The orchestrator's situational judgement must keep pace with this evolution.

    Three categories of intervention decision

    Effective situational judgement in human-AI teams requires the orchestrator to make three distinct types of decision rapidly and accurately.

    The delegation decision: is this task, in this context, safe to assign to AI with minimal human oversight? This requires assessing the task across multiple dimensions simultaneously: complexity, stakes, novelty, and the AI system's track record with similar tasks. Routine, well-defined, low-stakes tasks with a strong AI performance history are candidates for full delegation. Tasks that involve ambiguity, high stakes, or contexts the AI has not previously encountered require closer human involvement.

    The intervention decision: the AI is already operating, and something unexpected has occurred. Should the orchestrator step in, and if so, how? Raisch and Krakowski (2021) note that the most common error in this category is delayed intervention, where the human notices a potential problem but waits to see if the AI self-corrects, losing valuable time and allowing the error to propagate through the workflow.

    The escalation decision: this situation exceeds both the AI's authority and the orchestrator's own. It must be referred to someone with greater expertise, broader authority, or deeper contextual understanding. Effective escalation requires the orchestrator to recognise the limits of their own judgement as well as the AI's, a form of metacognitive awareness that is difficult to develop but essential for safe operation.

    There is also an intervention paradox to be aware of: the more reliably an AI system performs, the harder it becomes for human supervisors to maintain the vigilance needed to catch the rare failures. Research in aviation and nuclear safety has documented this pattern extensively. Sustained attention to a process that almost never fails is cognitively demanding and psychologically draining. Effective orchestration systems must be designed with this constraint in mind, incorporating active monitoring tasks rather than passive surveillance.

    Pattern recognition and the role of experience

    Klein's (1998) research demonstrates that expert decision-making under uncertainty is fundamentally a pattern-recognition capability. Experienced professionals are not faster at running through decision algorithms. They recognise familiar configurations, understand what they typically signal, and respond appropriately without conscious deliberation.

    For AI orchestrators, pattern recognition operates across two dimensions. The first is operational pattern recognition: does this AI output or workflow state look like previous instances that led to problems? The orchestrator who has encountered multiple cases of AI confabulation in similar contexts develops an intuitive sense for when outputs warrant closer scrutiny. The second is contextual pattern recognition: does this situation as a whole signal elevated risk? A client with an urgent deadline, a novel task type, and a team member new to AI collaboration may together indicate a context where closer human involvement is warranted, even if the AI's output looks technically sound.

    Pattern recognition cannot be taught through instruction alone. It develops through deliberate experience with reflection. Organisations that create structured opportunities for orchestrators to review cases, analyse exceptions, and discuss decision-making build this capability systematically rather than relying on accumulated experience alone.

    The Exception Triage Framework

    Effective exception handling in human-AI workflows requires both individual capability and structural support. At the individual level, orchestrators need clear mental models for categorising exceptions. The Exception Triage Framework provides four categories:

    Green: routine variance. The AI output is slightly off but within acceptable parameters. The orchestrator corrects and moves on. No escalation required. Logged for pattern tracking.

    Amber: significant deviation. The AI output is materially wrong, or the situation has moved outside the AI's established operating parameters. The orchestrator intervenes, corrects the workflow, and documents the deviation for review. May require peer consultation.

    Red: critical exception. The AI has produced output or taken action that creates risk to people, compliance, reputation, or safety. Immediate human override. Escalation to appropriate authority. Formal incident review required.

    Grey: uncertain. The orchestrator is unsure whether the situation requires intervention. The default in this category should always be to pause, seek input, and err on the side of human oversight rather than allowing autonomous processing to continue while uncertain.

    At the structural level, organisations need exception handling protocols that specify who is authorised to override AI decisions at each level of severity, what documentation is required when a human override occurs, how exceptions are fed back into system improvement, and what happens when the person monitoring the AI is uncertain about whether intervention is needed.

    Pre-mortem thinking for AI workflows

    One of the most effective techniques for building situational judgement is the pre-mortem: before launching an AI-assisted workflow, the orchestrator imagines that the process has failed and works backwards to identify the most likely causes. This technique, drawn from Klein's (1998) research on prospective hindsight, surfaces risks that might otherwise be overlooked because the team is focused on executing rather than anticipating.

    In human-AI workflows, the pre-mortem should specifically address three questions. First, what types of input might the AI misinterpret? Second, under what conditions might the AI produce confident but incorrect output? Third, at what points in the workflow would a human be least likely to notice an error? The answers to these questions inform where to place human checkpoints, what to monitor, and what escalation triggers to establish.

    Reflection prompts for practitioners

    Think about an AI-augmented workflow you manage. Where are the points at which a human must actively decide whether the output is acceptable before the workflow proceeds? Are those checkpoints formally designed, or do they rely on individual initiative?

    When was the last time you hesitated about whether to intervene in an AI process? What information would have helped you make the decision more confidently?

    If you applied a pre-mortem to your most important AI-augmented workflow right now, what would you identify as the most likely failure point?


    References

    Agrawal, A., Gans, J. and Goldfarb, A. (2022) Power and Prediction: The Disruptive Economics of Artificial Intelligence. Boston: Harvard Business Review Press.

    Klein, G. (1998) Sources of Power: How People Make Decisions. Cambridge, MA: MIT Press.

    Raisch, S. and Krakowski, S. (2021) 'Artificial Intelligence and Management', Academy of Management Review, 46(1), pp. 192-210.

    Free Diagnostic Tool

    Take the , a practical, source-backed assessment with auto-calculated scores and a personalised action plan you can download as a PDF.

    Take the

    Want to explore these ideas further?

    Let's discuss how we can help your organisation build the human advantage.

    Start a Conversation