Monitoring AI Quality
How to track AI quality with evaluator scores, feedback signals, and review filters.
Use both operator feedback and automated evaluator signals to monitor AI quality.
Main quality views
- AI quality report API (
/api/reports/ai-quality)- acceptance rate
- edit rate
- rejection rate
- Live chat quality filters
- low quality sessions
- hallucination flag
- circular response flag
- negative feedback count
Conversation evaluator
Both completed widget sessions and auto-replied email tickets are evaluated automatically each night with the same structured metrics:
- accuracy
- completeness
- resolution
- hallucination flag
- circular flag
- question type key
A composite quality score is stored and surfaced as badges — on the Live Chat list for widget sessions and on the inbox ticket list for email tickets. Conversations scoring below 70%, or tripping the hallucination / circular flags, are routed into the Review queue for triage.
Sampling follows the workspace's evaluation mode (full / ramping /
spot_check), with forced evaluation for negative-feedback messages,
low-confidence auto-sends, and unseen question types.
What to watch weekly
- rising rejection or edit rates in one category
- repeated hallucination flags for the same question type
- low-quality clusters after KB or policy changes
- high negative-feedback sessions that were not escalated
Closing the loop
The fastest path to fixing a bad response pattern is AI → Lessons → Review queue: every eval-flagged conversation lands there, and a single coaching note becomes a lesson the AI applies on similar future tickets overnight. See How the AI learns from your team.
For broader pattern fixes:
- Add/refresh KB coverage for that scenario.
- Tighten instructions for risky behavior.
- Increase confidence threshold for that flow.
- Leave coaching notes on representative flagged tickets.
- Monitor acceptance/edit/rejection deltas over the next week.
Nightly aggregation jobs keep quality insights fresh, but immediate operator feedback signals are still the fastest indicator of drift.