Designing for Uncertainty
AI is never 100% confident. Good AI product design communicates uncertainty clearly — so users calibrate their trust appropriately and don't over-rely on AI output.
Confidence scores and what they mean
Most AI models produce a confidence score alongside their output — a number between 0 and 1 representing how certain the model is. A cross-reference extraction model might return '0.94 confidence that this reference points to Section 4.2(b)'.
But confidence scores are tricky: - A model can be overconfident — high score on a wrong answer - Confidence is domain-specific — a model calibrated on English contracts may be systematically overconfident on Japanese ones - Raw numbers (0.94) mean nothing to a lawyer — they need to be translated into actionable signals
The PM's job is to define how raw confidence scores translate into user-facing signals.
Aji in Go means 'taste' — potential moves or complications that haven't materialized yet. Low-confidence AI suggestions are like aji: there's something there, but it hasn't been resolved. Good design acknowledges the aji without forcing a decision.
Confidence tiers in product design
Rather than showing raw numbers, convert confidence to actionable tiers:
High confidence (>0.90): Show prominently as a clear finding. User should act or explicitly dismiss.
Medium confidence (0.70–0.90): Show with a 'review recommended' label. Don't interrupt the workflow — surface in a sidebar.
Low confidence (<0.70): Don't show in the main UI. Either suppress entirely, or include in a 'possible issues' section the user can expand if they want.
These thresholds are a product decision, not a technical one. They balance the cost of false positives (user annoyance, automation bias risk) vs. false negatives (missing a real error).
Designing fallbacks and graceful degradation
Every AI feature needs a fallback for when it can't produce a confident output:
- The 'I don't know' fallback: If confidence is below threshold, don't surface anything. Better to surface nothing than a misleading low-confidence suggestion.
- The manual fallback: If the AI can't parse a section correctly (e.g., unusual formatting), fall back to showing the raw text with a note: 'This section couldn't be analyzed — review manually.'
- The scope fallback: If a feature doesn't work for a particular document type or language, disable it gracefully with an explanation, not a generic error.
At BoostDraft, graceful fallbacks are especially important because users are legal professionals who expect tools to be reliable. An AI feature that sometimes produces confusing output is worse than no AI feature.