Why low-confidence metadata proposals disappear after 30 days

Auto-classify proposes metadata on every uploaded doc. Proposals below 0.5 confidence that go un-reviewed for 30 days flip to 'expired' so the review queue stays focused on actionable items.

Updated 2026-05-03

When you upload a document, Kodori's auto-classify pipeline proposes metadata — sensitivity tier, target collection, keywords, doc type, retention class, AI summary. Each proposal lands in the review queue on `/doc/[id]` with a confidence score the model produced.

Most proposals are high-confidence and get accepted within the first day or two. Some land at low confidence (below 0.5 — the model's own midpoint) and the operator chooses not to act, either because the proposal was wrong, the field doesn't matter for that doc, or the operator is busy.

**The cleanup rule.** Every Sunday at 07:00 UTC, a background cron flips proposals matching all three criteria from `status='proposed'` to `status='expired'`:

1. The proposal's `confidence` is below **0.5**. 2. The proposal was created more than **30 days** ago. 3. No human has acted on it (status is still `proposed`).

Expired proposals stop appearing in the review queue but the rows are NOT deleted — they remain in the database as audit-trail evidence that the model proposed a specific suggestion at a specific time. If you query the underlying `metadata_suggestions` table directly (via SQL or the audit log), you'll still find them.

**Why not delete?** Two reasons. First, provenance — being able to demonstrate to an auditor "the model proposed this on date X, the operator did not act, we cleaned up the queue on date X+30" is part of the governance story. Second, re-classifications on a new version of the doc upsert on the `(documentId, kind)` unique constraint, so a fresh proposal naturally clobbers the expired row — storage doesn't grow indefinitely.

**Why these thresholds?** The 0.5 confidence floor matches the model's self-reported midpoint — above it, the model leans toward the proposal (worth keeping); below it, the model is uncertain by its own admission. The 30-day staleness window is roughly two review cycles for a typical legal / AEC operator. If they haven't acted by then, the proposal is implicitly unwanted.

**Why no per-doc audit event?** Proposals are advisory data, not governance state. Expiring them is operational hygiene — it doesn't change any document's sensitivity, retention, or metadata. Emitting per-doc events would inflate the audit log for no compliance value. The Inngest run-history captures the cross-tenant batch counts (per tenant + per kind) for forensics.

**Will this delete proposals I want to keep?** No. If you've accepted or rejected a proposal, it's no longer at `status='proposed'` and the cron skips it. If a proposal is at high confidence (≥ 0.5), the cron skips it regardless of age. The cleanup is intentionally narrow.

**Future iteration.** If the 0.5 confidence floor proves too aggressive (operators complain about expired proposals they would have acted on) or 30 days proves too short, both thresholds can move. Today they're constants in `packages/workflow/src/functions/metadata-suggestion-expiry.ts`; per-tenant overrides land if there's customer signal that defaults are wrong.