PDF redaction — draw boxes, burn to a new immutable version

Click Redact on any live PDF. Draw boxes on each page, click Burn — Kodori rasterizes the boxes onto the bytes and creates a new immutable version. The original is preserved.

Updated 2026-05-26

/doc/[id]/redact is the PDF redaction surface. Pair it with /privilege-log: the privilege log lets you classify what to withhold; the redaction tool lets you produce a partially-redacted copy of a doc that's mostly producible but has a few privileged or PII paragraphs.

**The flow:**

1. Open any live PDF document. Click the **Redact** button next to Download. 2. Each page renders via PDF.js with a transparent overlay. 3. Click and drag on a page to draw a redaction box. The box saves immediately as a pending overlay (mutable — you can remove it via the × button on the box, or add more). 4. When all the boxes are placed, click **Burn redactions to new version**. Confirm. Kodori uses pdf-lib to overlay opaque black rectangles at every box's coordinates, flattens, and creates a new immutable document version. The original version is preserved in the doc's history. 5. The burn page redirects you to /doc/[id] showing the new version. Anyone with read access to the doc going forward sees the redacted version; the audit log shows who could see the unredacted bytes (via the document's version history).

**What gets stored where:**

- *document_redactions* table — pre-burn boxes (mutable). Cleared after a successful burn. - *document_versions* — append-only version history. The burn creates a new row with previousHash pointing to the version that was redacted. - *events* — every box add / remove / burn fires a hash-chained audit event. The burn event payload includes the full box list (page, x, y, w, h, label) so a future investigator can reproduce "what was redacted" without ever recovering "what was behind."

**Privacy scan (AI checklist).** Click the **Privacy scan ✨** button on the redact surface and Kodori loads the doc's extracted text, runs Haiku via the existing model provider, and returns a checklist of redaction candidates across 12 categories: us-ssn, credit-card, bank-account, phone-number, email-address, date-of-birth, street-address, medical-record-number, attorney-client-privileged, attorney-work-product, trade-secret, other-pii. Each candidate carries a verbatim snippet (so you can find it on the page), a one-sentence reasoning, a confidence band (high / medium / low), and a page number when the extractor preserved it. Color-graded chips by category make scan-and-triage fast. Per-card dismiss hides false positives without re-running. The checklist does NOT auto-draw boxes — operator confirms each candidate by drawing the redaction box manually on the canvas. Cost is ~$0.001 per scan. Cap is 60,000 chars of input — for longer docs, run the scan, burn the first batch, then re-extract and re-run on the cleaned bytes.

**Coordinate system.** Boxes are stored in PDF user-space units (1pt = 1/72 inch), not client pixels. Re-renders at any zoom continue to align. pdf-lib uses bottom-up Y; PDF.js uses top-down Y; the conversion is automatic on save + render.

**Permissions.** Anyone who can read a doc can propose redactions today (the audit log is the safety net — every move is recorded). A future iteration adds a separate `document:redact` permission action so workspaces can constrain redaction to a specific role.

**Irreversibility.** The burn is irreversible. The bytes behind the rectangles are removed from the new version — there is no "un-redact." The original version stays in document_versions for the doc's lifetime, but anyone without permission to read prior versions cannot recover the unredacted bytes.

**Coming next:** in-table label per box ("PII", "Attorney-Client", "Witness identity") rendered as a hover tooltip; multi-doc redaction (apply the same box to a set of similar templates); page-level redaction shortcuts (redact this whole page).