Pre-trained document type hints

Kodori knows what an NDA, invoice, RFI, or W-9 looks like — a deterministic pattern matcher recognizes the top US Legal / Finance / AEC document types alongside the LLM classifier.

Updated 2026-04-25

Alongside the LLM-driven auto-classifier, Kodori runs a deterministic pattern matcher over each document's display name + first ~4KB of extracted text. It recognizes the top US document types across three verticals:

- **Legal:** NDA / confidentiality agreement, engagement letter, master services agreement, discovery letter, subpoena, court filing. - **Finance:** invoice, purchase order, receipt, expense report, W-9, W-2, 1099, tax return. - **AEC / construction:** RFI, submittal, change order, meeting minutes, inspection report.

When the deterministic matcher fires with high confidence (≥0.9) AND the LLM didn't propose a docType, Kodori promotes the deterministic match to the suggestion row. The reasoning column shows the patterns that matched ("displayName matches a 'NDA' anchor and 2 body patterns also matched"). When both fire and disagree, the LLM wins (it has richer text context).

Two practical effects: 1. Unambiguous documents — an "Invoice from BigCo.pdf" with a "Bill to:" / "Amount Due" body — are tagged consistently no matter what the LLM does on a given run. Less drift over time. 2. The AP-invoice review workflow (/help/ap-invoice-workflow) triggers reliably: when this matcher labels a doc "invoice", the structured-fields extractor fans out automatically.

If your firm's vocabulary differs (you call them "matter intake forms" not "engagement letters", or you want to recognize a custom record type like "controlled drawing"), file a feature request — the hint catalog is intentionally small and curated, and we extend it on customer demand. The LLM classifier still picks up uncategorized types — it doesn't NEED a hint to do its job.