Legal citation extraction

Open any document at /doc/<id>. Below the Notes panel you'll see Legal citations.

If extraction has finished and the document has searchable text, click "Extract citations". Kodori scans the extracted text with a regex set covering seven canonical American legal-citation shapes:

- **Cases** — `<volume> <reporter> <page> (<year>)` against a curated reporter list (U.S., S. Ct., L. Ed., L. Ed. 2d, F., F.2d, F.3d, F.4th, F. Supp., F. Supp. 2d, F. Supp. 3d, F.R.D., B.R., A.2d/3d/4th, P.2d/3d/4th, N.E.2d/3d, N.W.2d/3d, S.E.2d, S.W.2d/3d, So. 2d/3d, Cal. Rptr. 2d/3d). - **Statutes** — `<title> U.S.C. § <section>` with optional subsection forms (§ 1331(a), § 1332(d)(2)(A)). - **Regulations** — `<title> C.F.R. § <section>` (e.g. 29 C.F.R. § 1910.146). - **Procedural rules** — `Fed. R. Civ. P.` / `Crim. P.` / `App. P.` / `Bankr. P.` rule numbers (Fed. R. Civ. P. 26(b)(1)). - **Evidence rules** — `Fed. R. Evid.` rule numbers (Fed. R. Evid. 803(6)). - **Dockets** — `No. 21-cv-1234` / `Case No. 22-cm-5` (federal civil/criminal/multi-district forms). - **Constitutional** — `U.S. Const. art. III, § 2` / `amend. XIV, § 1`.

Repeat appearances of the same citation in one document collapse into a single row with an occurrence count — a brief that cites Brown v. Board, 347 U.S. 483 (1954) twelve times shows ONE row with × 12.

The extractor favors precision over recall — a citation index with 30 real citations and 0 noise beats one with 30 real and 8 noise. International citations (UK, Canadian, EU), state-specific reporter forms outside the canonical list, and bare slip-opinion citations (no reporter) are out of scope for v1; if your firm needs one of these, ask and we'll wire the pattern.

Re-running on the same document is idempotent — Kodori upserts by (tenant, document, fingerprint) where fingerprint is SHA-1 of the normalized form. Use "Re-extract" after a new version uploads to refresh the index.

The agent can also extract on your behalf: "extract citations from this document" calls extractCitations under the hood. Every run emits a `citations.extracted` event on the document's audit stream — the matter timeline (/collections/<id>/timeline) and global /audit show extraction events alongside every other governance signal.

## Per-matter citation rollup

Once you've extracted citations from individual docs, /collections/<id>/citations aggregates the index across every readable doc pinned to the matter. Group key is (kind, normalized) so the same citation in five docs collapses to one row showing total occurrences across the matter + a per-doc breakdown of where it appears (linked into the per-doc Citations panel via #citations anchor).

Ranked by total occurrences first, alphabetically second. Filter chips for the seven kinds (case / statute / regulation / rule / evidence / docket / constitutional). Top-of-page stat shows "documents with citations / documents in scope" so you can see if your matter is fully indexed.

Permission-trimmed: docs you can't read drop from the rollup, even if their citations would otherwise contribute. Find the rollup via the "Citations →" affordance on the matter page header alongside "Timeline →" and "Download as ZIP".

## Bulk extraction

When the rollup detects unindexed docs in scope, a "Bulk-extract citations" button appears. Click it — Kodori runs the extractor across every readable doc in the collection in one shot. Permission-trimmed; capped at 200 docs per call (re-run to continue past the cap). Idempotent: by default only docs without any indexed citations get extracted. Toggle "Re-extract already-indexed docs" if you want to refresh after a new version cycle.

The bulk run continues past per-doc failures — if 50 docs are in scope and 3 lack extracted text, you get 47 successes + a banner reporting the 3 skips. Each successful doc emits its own `citations.extracted` audit event with `source: "bulk-extract"` so the chain attributes per-doc work to the bulk run that triggered it.

## Tenant-wide search

/citations is the third citation surface — alongside the per-doc panel and the per-matter rollup. Type a citation (case name, U.S.C. section, docket number, or any partial form) and Kodori returns every readable doc citing it ranked by total occurrences. Useful for "find every brief in the firm citing 347 U.S. 483" — the partner-level search.

Backed by the (tenantId, normalized) index — substring-ilike against the normalized form, fast enough for interactive search even on indexes with hundreds of thousands of citations. Permission-trimmed: docs you can't read drop from the results, so a screened attorney sees no citations from matters they're walled off from.

When the search input is empty, the page surfaces the top-25 most-cited citations across the tenant as a starting view. Filter chips for the seven kinds with per-kind result counts.

## Citation alerts — be notified when a new doc cites X

Subscribe a citation at /citations/alerts. Whenever a new document in the workspace runs through the citation extractor and the result contains your citation (substring match on the normalized form), Kodori sends an email to the recipient you configured (defaults to your address; override for paralegal team mailbox / shared inbox).

Substring matching catches parenthetical references — a subscription to `347 U.S. 483` fires for "Brown v. Board, 347 U.S. 483, 495 (1954)" too. Optional kind filter narrows to one of the seven citation kinds when you only care about case cites.

Permission-trimmed: an alert only fires when its subscriber still has read access on the source document. Prevents the alert from being a side-channel for restricted-doc disclosure.

Pause an alert during a verbose discovery window; resume when the matter cools off. Hard-delete to fully remove. Each alert tracks fire count + last-fired date so you see which subscriptions are noisy.

The alert lifecycle (created / paused / resumed / removed / fired) all lands on the tenant audit stream, so admins reviewing the chain can answer "who subscribed to what when, and how many times did it fire."

Related in Documents

Bulk operations on /search results

How extraction works — what file types Kodori reads, in what order

Upload documents