Unit 3/Lesson 2 of 4

Designing for Uncertainty

TL;DR

AI is never 100% confident. Good AI product design communicates uncertainty clearly — so users calibrate their trust appropriately and don't over-rely on AI output.

SkillsUncertainty UXConfidence communicationAI trust calibration

+20 XP

Confidence scores and what they mean

Most AI models produce a confidence score alongside their output — a number between 0 and 1 representing how certain the model is. A cross-reference extraction model might return '0.94 confidence that this reference points to Section 4.2(b)'.

But confidence scores are tricky: - A model can be overconfident — high score on a wrong answer - Confidence is domain-specific — a model calibrated on English contracts may be systematically overconfident on Japanese ones - Raw numbers (0.94) mean nothing to a lawyer — they need to be translated into actionable signals

The PM's job is to define how raw confidence scores translate into user-facing signals.

囲

AjiあじGo analogy

Aji in Go means 'taste' — potential moves or complications that haven't materialized yet. Low-confidence AI suggestions are like aji: there's something there, but it hasn't been resolved. Good design acknowledges the aji without forcing a decision.

Confidence tiers in product design

Rather than showing raw numbers, convert confidence to actionable tiers:

High confidence (>0.90): Show prominently as a clear finding. User should act or explicitly dismiss.

Medium confidence (0.70–0.90): Show with a 'review recommended' label. Don't interrupt the workflow — surface in a sidebar.

Low confidence (<0.70): Don't show in the main UI. Either suppress entirely, or include in a 'possible issues' section the user can expand if they want.

These thresholds are a product decision, not a technical one. They balance the cost of false positives (user annoyance, automation bias risk) vs. false negatives (missing a real error).

Designing fallbacks and graceful degradation

Every AI feature needs a fallback for when it can't produce a confident output:

- The 'I don't know' fallback: If confidence is below threshold, don't surface anything. Better to surface nothing than a misleading low-confidence suggestion.

- The manual fallback: If the AI can't parse a section correctly (e.g., unusual formatting), fall back to showing the raw text with a note: 'This section couldn't be analyzed — review manually.'

- The scope fallback: If a feature doesn't work for a particular document type or language, disable it gracefully with an explanation, not a generic error.

At BoostDraft, graceful fallbacks are especially important because users are legal professionals who expect tools to be reliable. An AI feature that sometimes produces confusing output is worse than no AI feature.

Practice1 / 3

BoostDraft's model identifies a potential defined-term inconsistency with 0.65 confidence. According to a confidence-tier design approach, what should the product do?

Sharpen the skill

Calibration in Machine Learning

Technical explanation of model calibration — when and why confidence scores don't match real accuracy.

← Previous Unit overview