Sensitivity labels and auto-classification

Every document carries a sensitivity tier from public to regulated. The agent proposes a label on upload; you confirm.

Updated 2026-05-26

Sensitivity is a 5-tier label (public / internal / confidential / restricted / regulated) on every document. It's visible everywhere a document appears — dashboard, search, collection lists, doc detail.

When extraction succeeds, Kodori's classifier (Claude Haiku) reads the extracted text and proposes:

- a sensitivity label (the highest plausible tier; defaults to "internal" when unsure) - a Collection from your existing list, when one fits - 2–8 keyword phrases a lawyer or PM would actually search - a doc-type noun phrase (e.g. "NDA", "invoice", "RFI", "meeting notes")

Proposals show in a "Suggested by Kodori · Agent fills · you confirm" panel on the document. Each one has its own Accept / Dismiss buttons — accepting writes the durable mutation and emits an event so the audit log records the human decision, not the agent.

**Download watermarking on confidential+ PDFs.** When a workspace member downloads a PDF at sensitivity ≥ confidential, Kodori stamps every page on-the-fly with three layers: workspace-name + tier in the top header bar, a diagonal CONFIDENTIAL / RESTRICTED / REGULATED stamp at 18% opacity ochre across the page center (text tracks the actual tier so a regulated doc reads visibly differently from a confidential one), and a footer reading "Downloaded by <email> on YYYY-MM-DD · Kodori". Closes the chain-of-custody gap where an internal user could download a regulated doc without any visible mark — every screenshot or forwarded copy traces back to the originating download via the email + date in the footer. Public + internal docs keep the cheap 302-redirect-to-R2 path (no Vercel egress); confidential+ traffic streams the watermarked bytes back through the route. Cost: ~15-30ms per page via pdf-lib. Defensive try/catch — corrupted PDF falls back to bytes-unchanged rather than failing the download.