
Designing an AI-assisted review workspace for high-stakes decisions at global scale.
Evidence-first AI-assisted review for high-stakes decisions.
Business context: Recorded exam review at scale creates a signal-to-noise problem. Reviewers can’t watch thousands of hours manually, and AI flags alone aren’t sufficient for decisions with real consequences.
What I led: I designed an evidence-first, triage-based workstation that ranks sessions by risk, surfaces contextual proof, and keeps human judgment firmly in the loop — a human-in-the-loop design pattern for agentic AI operating under high-stakes constraints.

AI flags without context increase risk and reviewer fatigue.
Moving from live proctoring to recorded review shifts the bottleneck: reviewers need to find brief infractions quickly, but raw AI flags can be noisy and can introduce automation bias if presented without supporting evidence.
Owned the review experience and partnered with AI/ML and investigation teams.

High-stakes adjudication, auditability, and automation-bias risk.
Triage first, evidence always, and structured decisions.
I structured the workflow around triage-helping reviewers spend attention where it matters most-then built an evidence model that makes each flag inspectable and comparable. The UI supports fast review without turning the human into a rubber stamp.

Design for adjudication: proof packaging and confidence handoffs.

Higher throughput with stronger agreement and clearer decisions.
The workstation improved review throughput versus live proctoring while supporting consistent decisions and strong agreement between reviewers and investigators.
See the Outcomes panel for the throughput and agreement metrics tracked.
In agentic AI systems, the design contract is explainability — not the model.
The senior design work is making complex signals usable and fair: clear evidence, defensible decisions, and a workflow that respects human judgment.