Features · 217 capabilities, eight pillars
Everything Kodori does — in one page.
This is the canonical feature list for Kodori, the AI-native document management system by KumoKodo. If you’re comparing us against iManage, NetDocuments, FileHold, SharePoint, Documentum, or any incumbent DMS — read this page top to bottom and ask us where the gaps are.
Pillar 01
Find anything, fast
Search the way people actually ask questions. Plain language, exact phrases, or pure concept — Kodori runs all three paths and fuses the answers.
Hybrid search (keyword + semantic, RRF-fused)
Postgres full-text and pgvector embeddings run in parallel and combine via Reciprocal Rank Fusion. Each hit shows whether it came from keyword, semantic, or both — so you can see why a document surfaced. Sensitivity-narrowed queries (confidential / restricted / regulated) hit a partial GIN index ~1/10th the size of the full index (D285) — the compliance hot path stays sub-second at 100M-doc-tenant scale. At 1M+ tenants, a non-blocking inline tip (D286) nudges operators toward filtered queries when the search would otherwise scan unfiltered. Recency-narrowed queries hit a partial HNSW index over the trailing-year working set (D291) — the matter/project/"what changed?" hot path stays sub-second even at billion-vector scale. Per-tenant P95 search latency (D290) is sampled at 10% and surfaced on /admin/queue-depth — operators see "search is over the comfort zone" before users notice.
Unified search — documents, conversations, and audit events
/search has a "Search in" pill row toggling Documents / Conversations / Audit. Documents stays the default; flipping the other pills fans out three searches in parallel and renders three labelled sections. Each non-doc hit deep-links into its surface (drawer history for conversations, /audit for events). Source param preserved in the URL so a specific combination is shareable.
Saved searches with new-since-last-viewed badges
Name your common queries with their filters. Each saved-search chip carries a numeric badge counting documents matching the query that landed since you last opened it — clearer than email digests, no inbox noise. Click and the badge clears as a side effect.
Search filters by sensitivity and MIME family
Narrow to regulated records only, or only PDFs, or only Office files. Filters apply to the fused result list so you don't lose semantic-only matches by filtering.
Snippets that highlight matched terms
Every keyword hit ships a ts_headline-generated excerpt with the matched phrase wrapped in <mark> — so you can confirm relevance without opening the document.
Permission-trimmed at the index
Search runs canReadDocument(ctx) inside the SQL query, not as a post-filter. A viewer never sees a hit teaser for a record they can't open. Deny rules always win over allow rules. The deny-wins gate path is index-backed at every step (D284) — the collection-membership reverse-lookup index keeps permission trimming O(log n) at any tenant size, including 100M-doc tenants.
Bulk operations on results — collection, retention, trash
Multi-select checkboxes on every result row. With one or more selected, a sticky action bar at the top of the list lets you add to a collection, apply a retention class (admin-only), or trash with a required reason — in a single click. The 200-doc invoice batch that used to be 200 trips through /doc/[id] is now one trip through /search. Each iteration goes through the same per-doc MCP tool, so permission gates and audit-log shape are identical to the single-doc UI.
Bulk-apply retention to a whole collection — optionally narrowed by content type
On any collection page, admins get an "Apply retention class to this collection" form: pick a class, optionally check a few content types to narrow the scope (e.g. only the PDFs in a matter), hit Apply. Backfills override existing per-doc classes — that's the right semantic when you're standardizing a matter on one schedule — and each affected doc emits its own hash-chained audit event. Legal-held docs are unaffected. Every override is reversible from the doc's history. The "applied to N of M scanned" banner shows the implicit MIME filter narrowed correctly. Same MIME filter is now available on the bulk MCP tools (collection-add, set-retention, set-sensitivity), so the agent can do the same thing from natural language: "set 7-year retention on every PDF in the Smith matter."
Inline PDF viewer with find-in-document
PDFs render through pdfjs (the engine Firefox uses internally) on every doc detail page — consistent UX across Chrome, Safari, Firefox, Edge instead of variable browser-native viewers. Toolbar carries page navigation, zoom, and Cmd-F / Ctrl-F find-in-document that walks every page's text and surfaces per-hit page snippets. Selection + copy work natively. Lazy-loaded, so the dashboard bundle stays slim for users who never open a PDF.
Inline Word + Excel preview rendering
.docx renders with formatting via mammoth — paragraphs, lists, tables, basic styling preserved. .xlsx / .xls / .ods renders the first 10 sheets as HTML tables. Operators read contracts + pricing models on /doc/[id] without opening Word or Excel. Loads in a tightly-sandboxed iframe with strict CSP; mammoth + SheetJS lazy-loaded server-side so deployments that never preview Office docs don't pay the parse-tree cost. Closes the major UX gap on Office formats.
TIFF + HEIC inline preview — server-side conversion to PNG
Chrome / Firefox / Edge can't render TIFF / BMP / HEIC / HEIF in <img> tags inline (Safari is the only mainstream exception). The preview endpoint detects those formats, decodes the source bytes via sharp (libvips + libheif), bounds to 2048px on the long edge, and returns inline as PNG. Source TIFF stays untouched in storage; Download still serves the authoritative format. Multi-page TIFFs (legal fax-scan format) preview page 1; the agent + search still see every page (D296 wraps multi-page sources into a PDF for Claude vision). X-Kodori-Preview-Converted-From debug header surfaces the conversion source for operator forensics. Closes Roy's Task #158. See D297.
/browse — three-pane explorer (collections tree → docs → metadata)
FileHold-inspired tri-pane that preserves Kodori's collections-as-views architecture (no physical folder tree). Left: collections grouped by kind (drawer / cabinet / matter / project / smart). Center: documents in the selected collection with sortable columns. Right: metadata for the selected doc. URL state in `?collection=<id>&doc=<id>` so links and back/forward work — deep-link "the engagement letter inside the Smith matter" from agent chat or email. The familiarity-with-incumbents win without giving up the architectural difference. See D314.
Two-pane /doc/[id] — preview sticky left, info flows right
Doc-detail pages show the preview at lg:col-span-7 with `lg:sticky lg:top-6 lg:self-start` and access + metadata + retention + audit + links at lg:col-span-5. When you scroll the right pane to look at audit history or retention, the preview stays visible so you keep your visual anchor. Below `lg` it falls back to single-column for tablet + mobile. See D311.
Sidebar quick-access — Recent uploads + Saved searches
Two NavGroups at the top of the sidebar. "Recent uploads" shows your last 8 documents (createdBy=me, ordered most-recent-first) so you can jump back to whatever you just uploaded without /search. "Saved searches" shows your top 5 saved queries. Both lazy-load on layout render. See D313.
/extraction-issues — drill into stuck files with one click
The dashboard's "X stuck files" / "X unsupported files" hints are now clickable. /extraction-issues opens with three filter tabs (Unsupported / Failed / Stuck), permission-trimmed via canReadDocument. Click any row to jump to /doc/[id] for the right action. The self-service answer to "what is unsupported and why?" See D312.
Pillar 02
Ingest, extract, classify
Get documents in, and let Kodori do the metadata work. Drag-drop, email forwarding, or programmatic upload — every path runs the same extraction pipeline.
Drag-and-drop upload, including folders
Drop a single file, fifty files, or an entire folder hierarchy onto /upload. Bytes are hashed in your browser and uploaded directly to object storage — no Vercel 4.5 MB body limit. Folder uploads can become Collections in one click.
Mobile capture from your phone's camera + voice-note dictation
Open /capture on a phone and tap "Take photo" — the rear camera opens directly via the HTML5 capture attribute, no Kodori app to install. Or tap "Record voice note" right next to it — MediaRecorder captures audio (audio/webm on Chrome / Firefox / Android, audio/mp4 on iOS), the live duration counter tracks recording time, and "File + transcribe" uploads through the same presigned-URL pipeline as photos. The new whisper-transcribe extractor in the workflow registry routes any audio/* MIME type to OpenAI Whisper at $0.006/min and the transcript becomes the document's searchable text — auto-classify then produces an aiSummary, sensitivity, collection suggestion, and keyword tags exactly as it would for a PDF. Replaces the legacy "scan to email, file later" workflow for paralegals at courthouses, AEC superintendents at job sites, and partners dictating depositions in the back of an Uber. Installable as a PWA — long-press the home-screen icon for a Capture shortcut.
Browser-side perspective correction — capture a document at an angle, get a clean rectangle
When /capture detects document corners in a still photo (opencv.js, lazy-loaded ~9MB WASM only on first capture session), it applies a 4-point perspective warp before upload. The result lands on the server already de-skewed — no more trapezoidal contracts in your archive. Falls back to passthrough on detection-failure or unsupported browser without blocking the upload, so a rushed capture never costs the user the doc. Same downstream pipeline (extraction + classification + DLP) as a flat-bed scan.
Offline mobile capture buffer — IndexedDB + Background Sync
Captures from a phone on a job site or in a courthouse basement queue locally when offline. Service worker registers a Background Sync `capture-drain` tag; the browser fires the sync when connectivity returns, the worker reads the queue, and POSTs every row to /api/v1/documents with original metadata (mime type, display name, sensitivity, collection, captured-at timestamp). 201 deletes the row; non-201 leaves it for the next sync attempt (Background Sync's exponential backoff handles cadence). Foreground drain helper covers iOS Safari which doesn't support Background Sync yet — captures still get filed when the user returns to the page.
Cloud OCR cascade — Azure Document Intelligence + Google Document AI as configured fallbacks
Extractor registry runs in order: Azure Document Intelligence (prebuilt-layout) → Office adapters (.docx / .xlsx / .pptx) → Adobe Illustrator detection → Whisper audio transcription → Google Document AI → Claude PDF (vision) → built-in text. Azure adapter does the live REST call (POST `/documentintelligence/documentModels/prebuilt-layout:analyze?api-version=2024-11-30` + Operation-Location poll, paragraph-flattened text + first-word locale). Google adapter calls @google-cloud/documentai with three auth modes (inline service-account JSON, keyfile path, or ADC). Each cloud adapter self-reports supports=false until its env vars are set, so the registry walks past it cleanly when unconfigured — first call after provisioning Just Works without code changes. Claude PDF stays the LLM-driven fallback when no cloud OCR is configured.
Email-to-DMS, one address per workspace
Forward email to your tenant's ingest address. The body becomes one document, every attachment becomes its own document, all auto-classified. Cloudflare Email Workers route the message through HMAC-signed delivery to the inbound API.
Excel + PowerPoint add-ins — save workbooks and decks
Sister add-ins to the Word and Outlook task panes. Excel save-back is for pricing models, budgets, matter-tracking sheets — each iteration becomes a Kodori version with diff against prior versions. PowerPoint save-back is for pitch decks, matter-status presentations, trial exhibits. Same Office.js + roamingSettings pattern, same one Kodori API key, same New-doc / New-version dual-mode UX. Slice-by-slice document read with progress UI; 50MB cap matches the bulk-ingest API.
Word add-in — save drafts as new docs or new versions
Install the Kodori task pane in Word (desktop, web). Click "Save to Kodori" on the Home ribbon and the open .docx files in two modes: New document (lands as a brand-new Kodori record) or New version (search for an existing Kodori doc, save the open .docx as the next version with optional label like "Final draft" or "Sent to counsel"). Same Office.js + API-key flow as the Outlook add-in; one Kodori API key signs in to both. Slice-by-slice document read with progress UI; 50MB cap matches the bulk-ingest API.
Outlook add-in — file email and attachments from the ribbon
Install the Kodori task pane in Outlook (desktop, web, mobile). Open any email, click "File to Kodori", and the message body plus selected attachments land as Kodori documents — auto-classified, DLP-scanned, ready for one-click filing on the dashboard. Per-filing sensitivity override (set "regulated" for one privileged email without sticking), in-pane Collection picker (assign filed message + attachments to a matter / project in one shot), and a thread tracker that shows "N already filed from this thread" so you never re-file. Manifest URL self-points at your deployment so a self-hosted Kodori works without a build step. Cloud-link attachments are flagged unfilable; file-type attachments file individually with per-tile status. API key stored in Outlook's roamingSettings, synced across devices.
Content-addressable storage (SHA-256)
The blob key IS the content hash. Identical files dedupe automatically — uploading the same NDA twice never creates a second copy.
Extraction across PDFs, images, Office, text, .ai, TIFF + HEIC
PDFs and PNG / JPEG / GIF / WebP via Anthropic vision; Office formats (.docx / .xlsx / .pptx) via pure-JS adapters; text-shaped formats decode in milliseconds; Adobe Illustrator .ai files sniff for the embedded PDF and route accordingly. **TIFF / BMP / HEIC / HEIF** are decoded by `sharp` (libvips + libheif), normalized to PNG (single-page) or wrapped into a multi-page PDF (multi-page legal fax-scan TIFFs), then sent to Claude vision — every fresh tenant gets text out of legal scans + iPhone-camera photos without needing Azure or Google DocAI provisioned. When cloud OCR (Azure / Google DocAI) is configured, those still win for raster formats — they handle TIFF natively + are cheaper per page at scale. See D296.
Auto-classification (sensitivity + collection + keywords + doc type)
After extraction, Claude Haiku proposes metadata. Each suggestion has its own Accept / Dismiss button — accepting writes the durable mutation and emits a human-decision event. The model proposes, you decide.
Inline metadata editor on every doc page (D307)
The /doc/[id] page renders every top-level metadata key in a sortable key/value table — values type-aware (arrays of scalars as chips, objects in collapsible details, scalars plain). Add / update via the form at the bottom; empty value deletes the key. The value field parses as JSON first then falls back to a literal string, so `["Smith","Jones LLC"]` lands as a real array but typing `Smith Holdings LLC` without quotes still works as a string. Same audit shape as the agent + REST + bulk paths — one `document.metadata-set` event per changed key. Permission gate: creator OR tenant admin / owner.
Pre-trained doc-type hints (top-20 US Legal / Finance / AEC)
Alongside the LLM classifier, a deterministic pattern matcher recognizes the top 20 US document types — NDA, MSA, engagement letter, subpoena, court filing, invoice, PO, receipt, W-9 / W-2 / 1099, tax return, RFI, submittal, change order, meeting minutes, inspection report. Unambiguous documents get tagged consistently without waiting on a model call.
DLP scanning on ingest with auto-escalation
Every uploaded document is pattern-scanned for US SSNs, Luhn-validated credit-card numbers, ABA-validated routing numbers, MRN identifiers, AWS access keys, GitHub tokens, PEM private-key blocks, JWTs, and generic API secrets. High-confidence findings auto-escalate sensitivity to "regulated" before the document is searchable. Medium-confidence findings surface for human review with confirm / dismiss buttons. The matched value is never stored — only a pre-redacted preview.
Bring-your-own KMS encryption key
Register a customer-managed Key Encryption Key in your AWS / Azure / GCP account. Every blob your tenant uploads gets a fresh DEK wrapped against your KEK; revoking the key instantly makes every blob unreadable, regardless of where the encrypted bytes live. Default tenants get envelope encryption out of the box via a deployment-managed key.
Audit log CSV + JSONL export with date + actor + stream filters
Narrow the audit log by event-type chip (~15 grouped categories covering every event the system emits), date range, actor (substring match across emails, agent IDs, system principals), and stream (substring match against streamId — paste share-link/, legal-hold/<id>, or any partial id to filter to one matter / share / hold's lifecycle). Then export the filtered set to RFC 4180 CSV for legal review / external-auditor handoff OR to JSONL for SIEM ingestion (Splunk HEC, Datadog Logs, Sumo Logic). JSONL preserves the payload jsonb as a real nested object — no inner JSON.parse on the payload field. Both formats share filter URL state. Filters are sticky; URL state is preserved so a specific filter combo is shareable + scriptable.
/audit "since my last visit" chip — admin returning from PTO catches up on missed events
Per-user state captured at /audit page-load. Click the chip to filter to events appended since your previous visit; the chip label shows the prior-visit timestamp so you know exactly what window you're looking at. Hidden on first-ever visit (no prior baseline). Plays well with the existing type / date / actor / stream filters — they AND together so "since my last visit AND types: legal-hold.*" is a one-click query. Per-user (NOT shared with other admins) — your colleague's catch-up visit doesn't reset your baseline. Excluded from CSV / JSONL exports because the download's downstream audience may not be the same user. New `last_audit_visit_at` column on users; updated best-effort after the page renders so auth correctness doesn't depend on the stamp write.
Inline diff badges on /audit — "confidential → restricted" at-a-glance
Mutation events carrying a from/to in their payload — tenant settings updates, retention class changes, sensitivity shifts, API key scopes / expiration / rate limit, webhook retry policy — surface a compact <from> → <to> chip directly in the /audit row summary. Read the change at-a-glance instead of expanding the row to read the JSON. Multi-field events (tenant.settings-updated) render up to 3 chips with the field prefixed (`scopes: x → y`); single-value diffs are unprefixed. Long values truncate at 30 chars with a hover tooltip. Non-mutation events render unchanged — the extractor returns nothing when there's no recognized diff shape, so the inline-badge presence itself is a signal that "this event changed something." Per-event-type extractor (NOT a generic JSON-walk) so unrelated `from` fields in non-diff events (e.g. invitation events) don't false-positive.
Saved /audit filter presets — admins stop re-building the same filter daily
Apply your filters on /audit, click "+ Save current filter", give it a name. The saved preset appears as a chip at the top of the filter card; one click loads the same combination. Per-user, per-tenant — your "monthly SOC 2 review" preset doesn't leak to colleagues, and theirs don't crowd your bar. 50-preset cap per user with × deletion on each chip; empty filter sets are refused so saved presets always carry signal. Chips are real <Link>s — middle-click opens in new tab, browser back/forward works, no JS needed for the load path. Hover any chip to see the saved-filter description ("3 types · 2026-04-01 → 2026-04-30 · actor ~ counsel@firm").
Workspace export (GDPR Article 20 + audit handoff)
Owner-only "everything we have on you" zip — every readable document plus structured exports of collections, retention classes, legal holds, members, the full hash-chained audit log, and the requester's own agent conversations. Permission-trimmed via the same canReadDocument gate the search index uses. Caps: 1000 docs · 5 GB · 100k audit events; manifest flags any cap that tripped. Built for GDPR portability, pre-migration backup, and SOC 2 evidence handoff.
SOC 2 controls mapping at /security/controls
Every AICPA Trust Services Criterion (CC1 through CC9 plus the Confidentiality additional category) annotated with the concrete Kodori implementation and a pointer to where evidence lives in the running product. 36 controls in total, each tagged Live today / Roadmap / On audit engagement. Designed as the first thing your security review hands an auditor — every "Live today" row is verifiable against the running product before a contract is signed.
Section-by-section conformance documents at /legal/*
Four buyer-grade compliance documents published as live HTML — GDPR / UK-GDPR / CCPA Article-by-Article rights mapping at /legal/gdpr, FDA 21 CFR Part 11 conformance claim covering Subparts B + C at /legal/21-cfr-part-11, EU AI Act Articles 11 / 12 / 14 / 50 disclosure at /legal/ai-disclosure, and SEC 17a-4(f)(3) audit-trail-alternative posture at /legal/sec-17a-4. Each document maps Kodori controls to specific regulation sections so security-review buyers can verify against the running product before a contract is signed. Replaces the prior practice of marking these as "phase-N" — the substrate has been live for many shipments; what was missing was the buyer-readable doc surface.
Workspace overview — five-second executive read
New /overview page pulls every signal Kodori already tracks into one composite view: document totals + 24h/7d ingest velocity, active legal holds, retention review depth, audit chain tip + last verification, AP queue health (pending / variance / awaiting receipt), agent activity in last 24h, anomalies + cap utilization (admin-only). Each tile color-coded by urgency; deep-links into the operational page for each area. Differentiates from the /dashboard daily-action surface — same data sources, different audience (partner / GC / compliance officer who wants a steady-state read, not a daily action queue). Live counts, no batch snapshot.
Access explorer — who can see what, queryable two ways
New /access page (owner / admin only) is the current-state view of every grant in the workspace. Pick a member to see what they have access to (offboarding, least-privilege review, post-incident audit). Or paste a document / collection id to see every grant scoped to it (partner review of privileged matters, pre-deposition exhibit access, quarterly attestation). Differentiates from /members (roles + invites) and /audit (mutation events) — /access is the live grant table queryable two ways. The auditor's "show me who can see this" question, answered in one click.
Auditor-ready compliance reports at /compliance/reports
Five pre-baked reports for the auditor's working papers: retention disposal log, legal-hold log, audit-chain verification log, DSAR fulfillment log, SOC 2 control evidence map. Each backed by the live audit log + projection tables — point-in-time, no batch snapshot to explain. One-click RFC-4180 CSV export. Auditor pastes into their working papers, no SQL queries hand-built against backup snapshots. Differentiates Kodori vs incumbents (iManage, NetDocuments, FileHold) where compliance evidence is a multi-week extraction project.
Live audit-chain integrity verifier
Owner / admin button on /audit re-runs SHA-256 over every event in the tenant's chain and confirms each prev_hash matches its predecessor. Pass returns the count walked plus latest event timestamp ("verified through 2026-04-26T18:14:02Z"). Fail returns a rich diagnostic of the first mismatch. The single biggest claim Kodori makes about tamper-evidence — demonstrable in one click. Sales asset for SOC 2 / 21 CFR Part 11 / e-discovery prospects.
Weekly proactive chain verification (cron)
Every Sunday 02:00 UTC, Kodori walks every tenant's hash chain per-partition (D288 chain-of-chains) and emits one audit.verification.completed per (tenant, partition) onto the per-tenant verification stream so the audit log itself records the proof. On failure, every workspace member gets a structured email with the first-mismatch detail AND the partition that broke. The on-demand /audit "Verify chain integrity" button (D289) renders per-partition status inline — operators clicking it see WHICH partition broke if anything fails, not a single ambiguous "mismatch detected." The on-demand verifier is "we verify when asked"; the cron is "we verify even when no one's watching" — the artifact auditors actually want.
Sub-processors page at /security/subprocessors
Every third-party service that may process Kodori customer data — 11 vendors (Vercel, Neon, Cloudflare, Anthropic, OpenAI, Resend, Stripe, Inngest, WorkOS, Google, Microsoft) — with vendor / purpose / what-they-actually-see / region / compliance reports. 30-day-written-notice change policy. Required reading for any GDPR / HIPAA / SOC 2 review.
Per-user DSAR export at /api/me/export
Every member can download their own data — authored documents (permission-trimmed), audit events they performed, saved agent conversations, profile record. Covers GDPR Article 15 (Right of Access) + Article 20 (user-driven portability) without owner involvement. Caps: 500 docs · 2 GB · 25k audit events.
/anomalies CSV export — quarterly compliance "every anomaly + decision + reason"
New /api/anomalies/export route (owner / admin only). 15-column CSV — id, kind, severity, status, actor info, occurrence_count, first/last_seen_at, decided_by + email, decided_at, decision_note, evidence_json. Optional URL query filters: ?from=YYYY-MM-DD&to=YYYY-MM-DD&status=open|acknowledged|dismissed|auto-paused. Date filter scopes to last_seen_at so a dismissed-last-quarter signal still surfaces. 50,000-row cap with comment-row truncation marker. The decision_note column captures D202 dismiss-with-reason text + the optional acknowledge note — compliance gets "47 dismissed last quarter, here's exactly why each one" without hand-building from /audit filters. Evidence kept as JSON since the shape varies per anomaly kind. Sister to D162 audit CSV.
Dismiss-with-reason on /anomalies — quick-pick chips + required note for audit defensibility
Today /anomalies dismiss is one-click with no note prompt — only acknowledge captures a reason. New per-row dismiss expand-form mirrors the acknowledge flow: four quick-pick reason chips ("False positive — expected business activity", "Investigated, legitimate", "Duplicate of an earlier anomaly already triaged", "System / agent change planned and announced") above a free-text input. Confirm-dismiss is disabled until a reason exists. Closes the audit gap where "47 anomalies dismissed last quarter" carried no defensible "why" trail. Reason lands on the existing anomaly.dismissed event payload — no schema change. Operators edit chip-prefilled text before confirming for specifics ("Q4 audit prep — high regulated reads expected").
Anomaly detection with agent step-up + per-tenant threshold tuning
Every 15 minutes Kodori scans the audit log for high-volume regulated reads, cross-sensitivity bursts, off-hours spikes from a single user, held-document read spikes, and agent runaway loops. High-severity AGENT signals auto-pause the offending principal via a deny rule on /permissions; un-pausing requires a written rationale captured on the audit log. The /anomalies queue is owner / admin only. Per-tenant threshold tuning on /settings/tenant — high-volume workspaces raise to suppress false positives, low-volume ones lower to catch smaller patterns. Defaults stay tuned for typical usage (60min window, 25 regulated reads, 200 agent tool calls, 5 hold-deny refusals). Empty input = revert to platform default; takes effect on the NEXT 15-minute sweep tick. Idempotent saves emit tenant.anomaly-thresholds-set on the audit chain.
Bulk re-extraction sweep
When a new extractor lands (or events ever lose a worker), the dashboard surfaces a "Re-run for all" button covering never-queued, failed, unsupported, AND stuck-pending docs (>5 min idle). Click is durable: the request hands off to a background Inngest workflow that bulk-marks pending in 1,000-row chunks and fans out per-doc extraction events in 1,000-event batches — survives any tenant size without timing out. Per-tenant concurrency-key 1 prevents double-fanout from rapid re-clicks.
CSV bulk metadata import — incumbent-DMS migration tool
The /migrate page accepts a CSV mapping each filename to its sensitivity, collection, retention class, and arbitrary metadata fields. Two-step preview-then-commit so the operator sees per-row outcomes (matched / unmatched / warning) before anything mutates. Each commit row goes through the same single-doc tools the UI uses, so audit trail + legal-hold deny-wins are identical to manual editing. Pairs with the existing folder-drop upload + email ingress to handle the bytes side. 10k row cap, 10 MB CSV cap.
Migration connectors (iManage / S3 / NetDocs / FileHold)
Pull documents directly from your incumbent DMS — no manual export. The S3-compatible bucket connector ships ready (AWS S3, Cloudflare R2, MinIO, Backblaze) — point it at any bucket your IT team can dump an export into; subdirectory paths become Kodori Collection paths and sidecar `*.kodori.json` files attach metadata. The iManage Work / Cloud connector is in beta — real REST integration with OAuth2 client-credentials, paginated discovery, custom-field metadata pass-through. NetDocuments lands Q3 2026 and FileHold lands Q4 2026 with locked credentials schemas already in place. Probe → discover → commit, all read-only on the source until commit moves bytes — discovery preview lets you scope the migration before any bandwidth is paid. Commit walks 100 docs per batch through the existing CAS pipeline; credentials are held in-memory only and never persisted in Kodori's DB.
Pillar 03
Organize without folder sprawl
Kodori has no folder tree. Documents live in Collections — saved views over your metadata. The same document can appear in matter, project, and cabinet contexts without a single duplicate copy.
Collections (cabinet / drawer / folder / project / matter / custom)
Pick a kind to match your firm's vocabulary. Pinned membership keeps explicit additions; rule-driven membership auto-includes documents that match a query.
Rename a collection — no recreate, no membership loss
Inline "Rename" affordance on /collections/[id] for the creator + tenant owner / admin. Saves through `renameCollectionAction` → `renameCollection` MCP tool, lands `collection.renamed` on the audit chain with previous + next values. Membership, rules, and ACL grants are unaffected — the rename is a pure metadata edit.
Rule-driven Collections
Describe a Collection in declarative terms — "every regulated PDF", "all NDAs from 2024", "every invoice tagged BigCo" — and matching documents auto-appear, computed at read time. Pinned documents still apply unless you make the Collection rule-only. The agent can author rules from natural language.
One blob, many Collections
Pin a contract into a matter, a project, and a "current quarter" cabinet. Storage stays singular. Moves are metadata edits — links never break.
Create-from-folder upload affordance
After a folder upload finishes, Kodori offers to seed a Collection from the top-level folder name. The fastest way to bring a project from a shared drive into one organized view.
Share a whole Collection in one click
Owners and admins grant read access on a Collection from /collections/[id] — every document pinned to it (and any pinned later) becomes readable for that teammate. A 200-doc matter is one grant, not 200. Per-doc deny still wins, so you can lock down a single privilege-protected doc inside an otherwise-shared matter. The agent can author the same grants from natural language ("share the Smith matter with Bob as a viewer").
Collection-driven sensitivity + retention inheritance — matter / project metadata applies on member-add
Configure a default sensitivity tier and / or default retention class on a Collection (matter / project / drawer / folder / cabinet / custom — any kind), and every doc filed there inherits automatically. Sensitivity is highest-tier-wins (escalates lower-tier members on add — a doc moved into a regulated matter becomes regulated; a doc that's already restricted moved into a confidential folder stays restricted; we never demote). Retention is no-override (applies only when the doc has no retention class yet — never overwrites an existing assignment because disposal cost compounds). Tenant owner / admin authors via setCollectionInheritance; applyCollectionInheritance backfills the rules across existing pinned + rule-matched members (paginated). Migration 0098 is the load-bearing schema work; the helper applies inheritance atomically with the membership row insert so the audit chain stays consistent. Closes the iManage / NetDocuments / FileHold matter-level inheritance feature parity. Lowest-wins / strict-equality modes deliberately not shipped — silent-demotion footguns the SOC 2 narrative can't defend without explicit customer demand. See D294.
Bulk metadata across a collection / saved-search / uncollected — matter number, client code, parties, custom keys
Four bulk MCP tools cover the common collection-wide operations: bulkSetDocumentSensitivity, bulkSetDocumentRetentionClass, bulkAddDocumentsToCollection (inheritance applies), and bulkSetDocumentMetadata for arbitrary jsonb keys. The metadata patcher takes any patch object — matter number, client code, parties array, custom keywords, tenant-defined fields — and applies it across up to 500 docs per call (paginated for larger batches). Idempotent: keys whose value already matches are skipped without an event. Hold-deny-wins where applicable (sensitivity refuses to lower on legal-held docs). Reaches the same per-doc gates as the single-doc tools — permission trim, audit-event shape, and quota enforcement are identical to the per-doc UI. Tool descriptions explicitly call out "do NOT loop the per-doc tool for these requests" so the agent reaches for the bulk tool when the user asks to apply X to many documents at once. See D294.
Retention auto-apply rule backfill — apply a new rule to existing matched docs
When an admin adds a retention auto-apply rule (e.g. "every doc whose doc-type matches /invoice/ → 7-year retention"), the rule fires on new ingests automatically. Existing docs that match the rule pattern aren't affected unless someone re-uploads them — until now. New applyRetentionRuleToMatchingDocs MCP tool walks every live doc whose proposed-or-accepted doc-type matches the rule's pattern and surfaces a retention proposal in metadata_suggestions for human review (the same shape auto-classify produces on first ingest). Default dryRun=true returns preview counts; the operator inspects + re-runs with dryRun=false. Skips docs that already have a retention class set (we never override an existing assignment). The "agent proposes, human confirms" governance posture extends to backfill — retention is governance state, a wrong auto-apply has compounding cost. Tenant owner / admin only. See D294.
Pillar 04
Versions you can defend
Content-addressable identity means there's no "final_final_v3" guessing. The current version is unambiguous. Every prior version is immutable. Comparisons reach back as far as the bytes.
Upload new version of an existing document
Without creating a duplicate record. The previous version's metadata, permissions, and audit history carry forward; the new version becomes the current.
Check-in / check-out (soft edit lock)
Claim an exclusive edit window before working on a document. While the lock is held, other workspace members can't upload a competing new version. Uploading clears the lock atomically. Workspace admins can force-release a stuck lock; every state transition is on the audit log.
Optional human label per version
Inline-editable on the document page. "Final draft", "Sent to counsel", "Released" — versions carry the names that matter to humans.
Significance flag for review-cut versions
Mark versions that survive a review boundary so the audit trail surfaces them prominently.
Per-version download links
/api/doc/[id]/download?v=<hash> serves any historical version with a signed URL, scoped to the document.
Word-style redline compare (Litera-killer, built-in)
Compare any two versions in Word-style word-level redline view. Insertions render in green, deletions in red strikethrough, unchanged text neutral — the "Track Changes" reading legal + AEC reviewers expect for catching adds + removes in proximity. Two-state mode switch toggles to a unified line-level view for code or structured data. Mode rides in the URL so a specific view is shareable. Older versions re-extracted on demand. 200,000-char cap per side. Incumbents (iManage Workspace, NetDocuments) integrate with Litera as paid third-party software; Kodori's redline ships built-in, no extra contract.
Pillar 05
Governance, holds, retention
Records management designed for the firm that has to defend every decision. Legal holds and retention disposition share a deny-wins model that's easy to reason about under pressure.
Legal hold with subject-preservation
Bind documents to a matter; held records refuse to delete, refuse to dispose, refuse to downgrade sensitivity. Subjects stay on the hold record forever — release adds a reason; subjects don't disappear.
Cedar policy engine — real SDK wired today, hourly divergence-observation cron, shadow-mode (/policies)
Tenant admins author Cedar DSL policies at /policies, simulate them against a real Cedar engine (live SDK eval — engine-construction failures surface at simulation time), activate / archive them. Status flow: draft → active → archived. Real `@cedar-policy/cedar-authorization` SDK is wired today (D250) — `CedarInlineAuthorizationEngine.isAuthorized` runs against active policies via lazy server-only wasm load with per-tenant engine cache. Hourly Inngest cron (`cedar-divergence-observation`) replays write-side audit events through Cedar; when Cedar disagrees with the TS gate that already let an action through, emits `policy-engine.divergence` with the observed-event-id + cedar-action + cedar-decision + ts-decision + policy-version. TS gates REMAIN authoritative — Cedar accumulates divergence telemetry; the per-tenant cedar-authoritative flag flips after 30+ days of zero divergences. Default Kodori v1 schema bundled (User / Agent / System principals; Document / Collection resources; 9 standard actions). The escape hatch when a customer's policy ask doesn't map cleanly to roles + sensitivity tiers + collection grants — author it in Cedar, simulate against the real engine, ship it in shadow.
Per-resource access requests — members ask for read, admins approve from a queue
New /request-access form (any tenant member): paste a doc / collection UUID + an optional reason + Submit. Privacy-preserving by design — Kodori does NOT confirm the resource exists before queueing the request, so typos / random UUIDs sit in the queue without leaking which ids correspond to real resources. Owners + admins review at /access-requests with resolved resource names + Grant / Deny affordances. Granting INSERTs a permission row (action=read, resourcePattern=<kind>/<id>) AND emits TWO audit events: access-request.granted on the access-request stream + a standard permission.granted on the resource's stream with a grantedViaAccessRequest cross-reference in the payload — granted-via-request grants behave identically to admin-issued grants while keeping the request lineage queryable backwards. Sidebar count badge for admins. Email fan-out to up to 50 admin recipients on submit; in-app queue is the load-bearing surface.
Two-person delete on regulated documents — dual-control governance
Standard ask from healthcare / finance / government customers. When an operator clicks Delete on a sensitivityLabel=regulated document, the doc goes into a pending_deletions queue — a SECOND admin (≠ the requester, server-side enforced) must approve before the tombstone fires. New /pending-deletions admin queue lists active requests with doc / requester / reason + Approve / Reject. Approve invokes the standard tombstone path, so the legal-hold deny-wins gate still applies if a hold was applied between request and approval. Approval is recorded BEFORE the tombstone fires so the audit chain captures the human decision separately from the resulting destruction. 14-day TTL on pending requests. Three new event types: document.delete-requested / .approved / .rejected. The original requester can self-cancel (emits .rejected with cancelledByRequester: true so audit can distinguish). Other sensitivity tiers keep the single-person flow.
Custodian acknowledgment progress bar on /legal-holds/[id] — "12 of 18 acknowledged" at-a-glance
Above the custodian table, a stacked tri-segment progress bar surfaces acknowledgment status: emerald = acknowledged, amber = notified-pending, gray = unsent. Header shows "X of Y acknowledged" with a color-coded percentage badge (emerald 100%, amber ≥50%, neutral otherwise). Per-segment hover tooltip surfaces the bucket count. Hidden when no custodians exist (no clutter on a fresh hold). Lives above the Nudge / Re-send buttons so admins read status → action in one downward scan. Closes the at-a-glance "where do I stand on this hold?" question that today requires manual row-counting.
Litigation hold notice emails — name custodians, send single-use ack URLs, get an audit-logged "received" click back
The standard ediscovery requirement that incumbent DMS punt to a separate tool. Custodian table on /legal-holds/[id] takes a paste-many email list (same shape as bulk invite, capped at 100), Kodori dedupes against existing custodians, and persists each row alongside the hold. Click "Send hold notices" and Kodori emails every custodian a litigation-hold notice with the matter ref, workspace name, verbatim hold scope, and a single-use acknowledgment URL pointing at /legal-hold-ack/[token]. Re-sending rotates the token (so a leaked URL from a prior send becomes inert) and clears the prior ack so the recipient confirms the LATEST scope. The custodian opens the URL — no Kodori sign-in required — reads the scope, clicks "I acknowledge", and the click stamps acknowledgedAt + appends legal-hold.notice-acknowledged with actorId=public-hold-ack:<prefix> so the audit chain captures external acks distinguishably from authenticated activity. Idempotent. Per-row status (Acknowledged / Pending / Not sent) on the matter page so counsel sees at a glance who's covered. Two resend modes: "Re-send to all (rotates tokens)" for material scope changes, "Nudge unacknowledged (N)" for nudging laggards without disrupting acknowledged custodians — the nudge button only renders when there are people to nudge, with a count badge.
Semantic candidate finder for hold scope
Describe what the matter is about and Kodori runs the same hybrid search the agent uses, ranks candidates with snippets, and marks anything already bound. Bulk-bind the rest in a single click — no UUID copy-paste, no spreadsheet of doc IDs. The agent does the same flow from natural language and always previews before binding.
Held-doc deny-wins enforced top to bottom
The doc-detail Delete button disables itself with a "Release the hold first" tooltip. The retention review queue shows held docs but disables disposal. The MCP tools enforce the same gate server-side. Three lines of defense for the same invariant.
Retention classes
Define record categories with retention terms ("Tax records — 7 years"). Each class has a disposition mode (review / dispose-with-review) and a slug your firm can standardize on.
/retention disposal countdown — "Next eligible disposal: 2027-04-15 (in 11mo)" per class
Every retention class with at least one live doc renders an inline disposal-date line. Computed as min(documents.createdAt) + retainForYears years over live (non-tombstoned) docs in the class. Tone-coded — red within 30 days (urgent), amber within 180 days (heads-up), gray further out. Renders both ISO date (compliance write-ups) AND relative-time ("in 11mo", "in 2y" for at-a-glance daily triage). Tail note distinguishes review-disposition ("lands on /retention/review for human confirmation") from auto-tombstone-disposition ("auto-tombstones via daily cron"). Suppressed for empty classes. Inline subselect on the existing /retention query — fresh data, no cron lag, no denormalization invalidation surface.
Retention review queue (human-confirmed)
When a record's retention term elapses, it surfaces in /retention/review. Two actions: defer N years with a captured reason, or dispose now (calls tombstoneDocument). Auto-tombstoning is intentionally absent in v0 — see our stack-decisions doc.
Retention auto-apply rules
Map a docType pattern to a retention class — Kodori auto-suggests the right class on every NDA, invoice, 1099, RFI, engagement letter. Acceptance is still human; the rule never auto-mutates retention, just adds the proposal alongside sensitivity / collection / keywords on the suggestion panel.
Per-doc scheduled deletion — auto-tombstone on a date with hold-deny-wins applying
Set an explicit deletion date on a single document (admin / owner only). Daily 02:00 UTC sweep runs the standard tombstone path on docs whose date has passed; the standard legal-hold deny-wins gate still applies — held docs survive past their auto-delete date with the attempt logged via document.auto-delete-blocked-by-hold for forensic clarity. Distinct from retention classes (which are class-level POLICY with human-confirmed disposal); auto-delete is a one-off automatic action with no human in the loop at deletion time. Common patterns: NDAs that expire 5 years post-signing, marketing collateral with campaign deadlines, contractor records with mandatory purge dates. Cancellable any time before execution. Capped at 10 years out so fat-finger date entries don't stick around for centuries. Cursor-paginated at 100M-doc-tenant scale (D282) — the sweep stores resume state in cron_checkpoints and incrementally drains backlogs across consecutive runs without ever wedging on a stuck row.
Auto-delete reminder email — 14-day-out warning before the cron tombstones
The same daily sweep that tombstones expired docs ALSO finds docs scheduled to delete in the next 14 days and emails the doc creator + workspace owners + admins with the doc name, ID, scheduled date, reason, and a deep link to extend or cancel. Throttled to one email per ~13 days inside the window so the same doc doesn't spam inboxes. Re-armed on every set / clear so a freshly-rotated date generates a fresh reminder. New document.auto-delete-warning-sent event lands on the audit chain with recipientsCount in the payload. Closes the "I forgot it was scheduled" failure mode where operators returning from PTO discovered their record had silently auto-deleted.
Hash-chained event log per tenant
Every consequential mutation appends an event. Each row's prev_hash is the SHA-256 of the previous event — tampering is detectable without re-running anything. The substrate underneath SOC 2, 21 CFR Part 11, and any FRCP discovery. Chain-of-chains partitioned by calendar quarter (D287) at 100M-doc-tenant scale — verification can run per-partition with the inter-partition links collectively proving "the entire history is intact" without walking 1B+ events in one pass.
Compliance evidence packet — one-click PDF for the auditor visit
Owner / admin clicks Generate, browser downloads a single PDF with a cover page (tenant + window + record counts), live audit-chain integrity verification (PASS/FAIL with first-mismatch detail when broken), every legal hold + retention class, member roster with role-distribution summary, and the top-40 event types in the window with share-of-traffic percentages. Hash-stamped at generation — the SHA-256 of the bytes lands on the audit chain alongside a compliance.evidence-packet-generated event recording who/when/what scope. The auditor takes the PDF home; the chain proves the bytes weren't tampered with later. Designed for SOC 2 visits, 21 CFR Part 11 inspection prep, and FRCP discovery exhibits.
Audit log row grouping — readable at scale
Adjacent same-type-same-actor events within a 60-second window collapse into one expandable row with a "× N" badge. A bulk action that emits 50 collection.member-added events shows as one line instead of 50 — the audit page stays scannable even after a 200-row migration. Click to drill into the individual events.
Matter timeline view at /collections/[id]/timeline
The third audit-side surface alongside per-doc /doc/[id] history and global /audit. Aggregates every event across every document pinned to a matter PLUS the matter's own events (member-added, rule-updated, permission-granted) into one chronological narrative. Day-grouped with sticky headers, color-coded by event tone (red = DLP/anomaly, amber = legal-hold/retention/purge, emerald = create/version/annotation, blue = permission/collection). Filter chips computed from in-page data so operators only see types that exist in this matter. Permission-trimmed.
One-click reversibility
Most mutations have an inverse tool. From /audit, click "Revert" on a row — Kodori dispatches the inverse and appends a fresh forward event. The chain stays intact; the original event isn't edited.
Soft-delete with byte preservation
Tombstoning a document flips status to "tombstoned" but leaves the bytes in object storage and the audit trail intact. Recovery during the retention window is straightforward; hard purge happens behind a separate sweep.
Trash bin with multi-select bulk restore
/trash lists every tombstoned doc in the workspace, newest-first, with sensitivity / size / deleter / date columns + search-by-name. Owners and admins multi-select via checkboxes and bulk-restore with a shared reason; per-doc failures don't block the rest. Each restore appends document.restored to the doc's audit stream alongside the prior tombstoning context — the chain captures both the deletion AND the recovery as paired events. Solves the "we deleted some files we shouldn't have" recovery flow without a database admin.
Legal citation extraction — seven-kind regex index per doc
Click "Extract citations" on any /doc/[id] and Kodori scans the extracted text against curated patterns for cases (against a 30-reporter list including SCOTUS, federal circuits + supplements, state regional reporters), statutes (U.S.C.), regulations (C.F.R.), procedural rules (FRCP / FRCRP / FRAP / FRBP), evidence rules (FRE), federal docket numbers, and constitutional clauses. Repeat citations within one doc collapse to one row with an occurrence count. Idempotent on re-run via SHA-1 fingerprint dedup. Precision-over-recall — false-positive bias is the wrong tradeoff for a legal index. Permission-trimmed; the extraction event lands on the doc's audit stream so the matter timeline + /audit see citation work as a first-class governance signal.
Per-matter citation rollup at /collections/[id]/citations
Aggregates every readable doc's citation index across the matter, ranked by total occurrences. Same citation appearing in five briefs collapses to one row with a per-doc breakdown showing where it lives. Filter chips for the seven kinds. Coverage stat shows "documents with citations / documents in scope" so operators see how complete the index is. Permission-trimmed: a screened attorney sees no citations from matters they're walled off from. Answers the partner question "what does this matter rely on?" in one view.
Citation alerts — email me when a new doc cites this
Subscribe a citation at /citations/alerts (case, statute, regulation, docket, etc.). When a new document lands and the citation extractor picks up your citation, Kodori sends an email — substring match so subscribing to "347 U.S. 483" also catches "Brown v. Board, 347 U.S. 483, 495 (1954)" parenthetical references. Optional kind filter; pause / resume per alert; permission-trimmed (alerts only fire for docs the subscriber can read). Lifecycle (created / paused / fired / removed) all on the audit chain. The litigation power-tool: subscribe an authority you're tracking, get pinged the moment opposing counsel's next brief lands citing it.
Saved-search alerts — email me when a new doc matches my query
Subscribe an alert on any saved search at /search/alerts. When a newly-extracted document matches the saved search's keyword query (Postgres FTS via websearch_to_tsquery against document_content.text), Kodori emails the recipient with a windowed excerpt (~24 words around the FTS hit) so the matching context is visible in the inbox. Permission-trimmed against the subscriber's read access. Pause / resume per alert; full audit trail. General-purpose sister feature to citation alerts — same dispatcher pattern, broader query primitive.
Saved-search results CSV export — dump every match to RFC 4180 CSV
"Export matching docs to CSV ↓" link on /search/alerts per saved search. Streams up to 1000 hits as RFC 4180 CSV with documentId, displayName, mimeType, sensitivityLabel, sizeBytes, currentVersionHash, createdAt, lastModifiedAt. Permission-trimmed (you only see rows you can read). Postgres FTS only — same cost posture as the saved-search alerts dispatcher. The saved search's sensitivity + mime-family filters carry through. X-Kodori-Cap-Hit response header reports whether the 1000-row cap was reached. Each export emits saved-search-alert.fired with kind="csv-export" on the audit chain so admins reviewing the log see who exported what when. Standard ediscovery custodian-list workflow: save the search, share with co-counsel, export when the privilege log is due.
Ad-hoc CSV export from /search — no save-first required
"Export CSV ↓" button next to "Save this search" on /search. Operators dump interactive search results to CSV without first saving the search. Same column shape, same 1000-row cap, same permission-trim, same FTS-only cost posture as the saved-search export — just a different entry point that closes the friction of the "save the search first" flow. Active sensitivity + mime-family filters carry through. For one-off queries; recurring exports use the saved-search route so the audit chain captures the recurring activity.
Tenant-wide citation + drawing search at /citations and /drawings
Drilldown sister to the per-doc panel and per-collection rollups. Type a citation ("347 U.S. 483", "28 U.S.C. § 1331") at /citations OR a sheet number ("A-201", "S101") at /drawings and Kodori returns every readable doc that references it across the entire tenant, ranked by total occurrences. Empty state surfaces the top-25 most-referenced as a starting view. Permission-trimmed against canReadDocument: a screened attorney sees no citations from matters they're walled off from. Backed by the existing (tenantId, normalized) indexes from D140/D143 — fast enough for interactive search even at hundreds-of-thousands-of-rows indexes.
Annotations: threaded notes, @mentions, resolved state
Notes thread one level deep on every document. Reply on an open thread, @mention a teammate by email and Kodori sends them a notification with a deep link to the thread, mark a thread "Resolved" when you're done. The author, anyone @mentioned, or a workspace admin can resolve and reopen — every state change lands on the per-doc audit timeline so the conversation around a record is part of the record. The agent can author and resolve threads on your behalf.
Annotation auto-resolver — stale threads (30+ days no activity) auto-resolve daily
Closed matters accumulate open @-mentions in /mentions forever, ping-fatiguing operators with no actionable signal. Daily Inngest cron at 04:00 UTC finds root annotations where: resolvedAt IS NULL, createdAt < now - 30 days, AND no reply newer than 30 days, AND the underlying document is still live. Auto-resolves with actorKind=system. Reuses the existing annotation.resolved event type with autoResolved=true + staleDays=30 in the payload — distinguishable from operator-resolved without polluting the EventTypeSchema. Per-run cap of 500 so the cleanup drains gradually on workspaces with massive backlogs. Future revisit: per-tenant configurable threshold via tenant_settings.
Daily / weekly activity digest emails
Optional Resend-backed email summarizing open @mentions, new documents added in your readable scope, new legal holds, retention review queue depth (for admins), workspace governance queue (admin-only — pending two-person deletes, access requests, recent anomalies), AND AEC schedule risk (overdue + due-soon RFIs and submittals). Off by default — opt in from /settings/account → Activity digest. Daily fires at the next-eligible top-of-hour after 23h elapsed; weekly fires Mondays at 08:00 UTC. Document-level sections are permission-trimmed via canReadDocument — docs you can't read never appear. AEC schedule-risk and governance counts at the tenant level so every team member sees the same "your project is X items behind / N admin actions pending" signal; row deep-links jump to the matching admin page for one-click triage. RFC 8058 one-click unsubscribe. Audit-logged via digest.sent / digest.failed / digest.frequency-changed.
/mentions inbox — persistent surface for every thread tagging you
Closes the email-only loop: @mentions used to fire one Resend email and disappear — miss or delete the email and you lose the work. /mentions is now canonical. Lists every readable annotation thread tagging you across the tenant, newest-first, with Open / Resolved filter chips. Permission-trimmed: a doc you've been walled off from drops out even if a historical mention exists. Backed by the gin jsonb_path_ops index on annotations.mentioned_user_ids — fast at any tenant scale.
Regenerate AI summary on /doc/[id] — refresh stale summaries when context shifts
Click "Regenerate" beside the AI-summary callout. Kodori re-fires the auto-classify Inngest function which recomputes the 3-sentence summary alongside sensitivity / collection / keywords / docType in the same Haiku call (no extra LLM cost). Open to anyone with read access (the gate is LLM cost, not data sensitivity). Audit-logged via document.auto-classify-requested with reason: manual-regenerate. Async — the button shows "Refreshing…" and the page refreshes after 8s. Auto-classify's global concurrency cap of 10 means button-click floods queue naturally rather than overrunning the LLM. Useful when a doc gets re-uploaded, its classification context shifts, or the original summary missed a key clause.
AI document summaries — Haiku writes "what is this?" on every ingest
Auto-classify pipeline includes a 3-sentence summary alongside the existing sensitivity / collection / keywords / docType outputs (no extra LLM round-trip — same Haiku call). First sentence = what + who; second = the substance; third = the operative date / deadline / next action. Surfaces below the document name on every /dashboard recent-docs row + as an ochre-tinted callout above the preview on /doc/[id]. Operators see "what is this?" at a glance without opening the file. Cost-flat; high-visibility daily-use win.
Stale-proposal expiry — review queue auto-cleans low-confidence suggestions after 30 days
Auto-classify proposes sensitivity / collection / keywords / doc-type / retention-class on every uploaded doc. Most proposals get accepted within a day or two; some land at low confidence (below 0.5 — the model's own midpoint) and the operator chooses not to act. Without cleanup, those low-confidence rows sit in the review queue forever, drowning the high-confidence proposals that actually need attention. New weekly cron (Sundays 07:00 UTC) flips proposed → expired for rows below 0.5 confidence + older than 30 days; expired rows drop out of the review queue but stay in the database as audit-trail evidence. New versions of a doc upsert on (documentId, kind) and clobber the expired row, so storage doesn't grow indefinitely. Operational hygiene only — no per-doc audit events, since proposals are advisory data not governance state. See D295.
Public share links — tokenized read-only URLs for external recipients (closes the discovery production loop)
Inline "Share via link" button on /doc/[id], /productions/[id], + future surfaces. Operator types optional label / recipient hint / expiry; click Create. Plaintext URL surfaces ONCE; only SHA-256 hash stored. Tokens are 32 bytes of crypto randomness (256 bits entropy). Default 14-day expiration; max 90. Production-source share links serve the EXACT bytes captured when the production was recorded so a recipient downloading from "Production Set 1 — Jan 15" gets the bytes that were ORIGINALLY produced even after later re-stamps. Three new event types (share-link.created / share-link.accessed / share-link.revoked) audit-log the full lifecycle; access events use actorId="public-share:<prefix>" so external hits are distinguishable from authenticated ones. New /share-links admin surface lists every link with status / recipient / access count / last-access / Revoke. Closes the discovery production loop end-to-end: classify → redact → stamp → record → bind → SHARE.
Watch-folder sync companion CLI — kodori-sync drops new files from a folder into Kodori automatically
New @kumokodo/sync-companion package shipping `kodori-sync` Node CLI. Watches configured paths via chokidar with awaitWriteFinish (5s default stability threshold so half-written files don't get uploaded), POSTs new + modified files to /api/v1/documents using a documents:write-scoped API key. Three commands: `kodori-sync init` writes a starter config to ~/.kodori/sync-companion.json, default runs the foreground watcher, `--once` does a one-shot backfill. Forwards X-Kodori-Sensitivity / X-Kodori-Collection-Id from config and adds X-Kodori-Metadata: { syncCompanion: { sourcePath, hostname, pushedAt } } so operators trace which sidecar pushed a doc on /audit + /doc/[id]. 50 MB sync-upload cap matches the API; oversize files emit a clear skip message rather than wasting the round-trip. Closes the §15.2 "watch-folder ingest agent" deferral; Electron tray wrapper (auto-updater, OS notifications, queue persistence) deferred to customer demand.
Saved-search hits in the activity digest — top-5-by-count rows in the existing email instead of a separate digest
D150 daily / weekly activity digest now includes a "Saved searches with new hits" section. Per-user fan-out runs a permission-trimmed FTS count for each of your saved searches, constrained to the digest window with the saved search's sensitivity / mimeFamily filters applied. Top 5 by hit count surface in the email with deep links back to /search?savedSearch=<id> for one-click load. Reuses the existing Resend send + reply-to + unsubscribe plumbing — no new email infra. The in-app "new since last viewed" badge on /search chips remains the read-time complete view; the digest is the proactive push for users who'd rather get nudged in their inbox. Closes the §15.2 saved-search digest deferral.
BYO-key lifecycle audit events — every change to your key custody on the hash-chain
Three new event types — tenant-kms.registered (first-time BYO-key adoption), tenant-kms.rotated (carries from/to keyIdSuffix + provider), tenant-kms.disabled (keyRowId + previousStatus) — emitted from the existing register / rotate / disable actions on /encryption. Full keyIds (ARNs / Key Vault URLs / GCP CryptoKey paths) stay in the tenant_kms_keys table; only the last 12 chars surface in audit payloads to avoid leaking sensitive identifiers. /audit chip catalog under "API keys + digests" lets admins filter to types=tenant-kms.* for "every change to key custody in YYYY." The D196 inline diff badge surfaces "provider: aws-kms → gcp-kms" + "keyIdSuffix: …abc → …xyz" chips on rotated rows. Closes the SOC 2 / 21 CFR Part 11 gap where key rotation silently flipped status without an audit row. Re-wrap pipeline (live AWS/Azure/GCP KMS SDK calls) plan documented at docs/plans/tenant-key-rewrap-plan.md.
/share-links sortable columns — sort by Accesses / Last access / Expires / Created
Clickable column headers on /share-links. ?sort=<key>&dir=asc|desc URL params; clicking the same header flips direction, clicking a different column resets to desc. ISO-8601 lexical comparison for the timestamp columns is chronologically correct. Combines with the D194 q + status filters — sort happens AFTER filter so "highest-accessed link in matter X" is one click of the search box then one click of the Accesses header. Active column shows arrow indicator (↑ / ↓). URL-state sortable so views are bookmarkable + middle-clickable for new tab.
/share-links search + status filter — find every link to @firm.com without scrolling
Text input filters across label + recipient hint + token prefix + target name (substring, case-insensitive). Status select filters to active / expired / revoked / exhausted. URL-state captured so filter combinations are bookmarkable and shareable. Counter shows "Showing N of M" when filters are active — operators spot when results don't match expectations. Works without JS (GET form action), back-button returns to the previous filter, middle-click on saved filter URLs opens in new tab. In-memory filter over the existing 200-row list cap so no new SQL query / no new action signature — sub-millisecond at scale.
Bulk revoke share-links — clean up after matter closure with one click instead of N
Each active row on /share-links gets a checkbox; the Select-all-active header button selects every still-live link. Hit "Revoke selected" to revoke up to 200 share-links in one gesture. Hard tenant scope on the lookup so a malicious caller passing foreign ids gets not-found for them. Per-row failures (already-revoked, not-found) are counted but don't halt the batch — confirmation banner reports "Revoked N · M already revoked · K not found" for partial-success transparency. Each successful revoke emits the same share-link.revoked event a single-row revoke does (with `bulk: true` on the payload so audit consumers can distinguish), so the chain treats a 50-link bulk identically to 50 individual revocations.
Share-link direct email delivery — Kodori sends the URL to the recipient
When the recipient hint is a valid email, tick "Email this URL directly to the recipient hint" on the share-link form. Kodori sends a styled email via Resend with the URL + workspace name + operator name + expiry date. Reply-to set to the operator's email so opposing counsel can reply natively. Subject format ("X shared a doc with you — Smith production") matches the "personal share" pattern Dropbox / Google Drive / Notion use, so inbox filters catch the email instead of treating it as marketing. Skips the copy-paste-into-Slack workflow that's a transcription-error surface for high-volume ediscovery delivery. Best-effort — if Resend fails, the share-link is still created and the operator gets the URL back to copy manually.
External recipient email verification on share-links — 6-digit code unlocking
Tick "Require recipient to verify their email" on the share-link form for high-stakes deliveries. The recipient hits the URL and sees an email entry form instead of the doc — they enter their email, Kodori sends a 6-digit code (HMAC-keyed with AUTH_SECRET + share-link-id so codes can't be replayed across links; 15-min TTL; max 5 attempts), they enter the code, Kodori sets a 24-hour HMAC-signed cookie scoped to the share-link. Subsequent visits within 24h bypass the gate. Stacks with TTL (D128) + access cap (D161) + watermark (D157) + access notification (D159). Audit chain logs only the email DOMAIN on verification events (privacy posture: full address stays in the verifications table for in-product traceability but doesn't trickle into the chain accessible to all admins). Three new event types: share-link.verification-requested / -failed / -verified.
Share-link verification roster — see who actually verified, per link
When a share-link requires recipient verification, /share-links surfaces a "Verified" column with a per-link `<verified>` count (or `<verified> / <attempted>` when some recipients failed). Click the count to land on /share-links/[id]/verifications — owner / admin only roster of every (email, requestedAt, verifiedAt, attempts) row with verified / pending / expired status badges. Defends "X verified at 14:32 before reading" without manual /audit drilling. Built on a single grouped SQL read (group by share-link + lower(email), bool_or on verifiedAt) — only fires when at least one row in the page has verification required, so tenants without verification-required links pay zero query cost.
Workspace-default share-link access cap — set "expire after first download" once for HIPAA delivery
Owner / admin sets a workspace-default access cap (1-1000) on /settings/tenant. HIPAA shops set 1 (every share-link expires after first download); ediscovery rolling productions typically want no cap. Empty = no cap by default. createShareLinkAction reads the tenant default in the same SELECT that already loads D199 expiry + D201 notify defaults — single round-trip. New resolveMaxAccessCount helper centralizes the per-link-vs-tenant-fallback logic. Operator passing 0 / omitting the cap reads as "use tenant default". Per-link explicit cap overrides. Saved fields land on tenant.settings-updated for audit. Completes the share-link tenant-defaults quartet (D197 allowlist + D199 expiry + D201 notify + D205 max-access).
Workspace-default share-link "Email me when accessed" — completes the share-link tenant-defaults trilogy
Tri-state radio on /settings/tenant: "Use global (on)" / "On" / "Off". HIPAA shops want default on (every access notifies); ediscovery platforms running high-volume delivery want default off (admins' inboxes don't fill up). Operators still pick per-link; this only changes the form's pre-fill and the server-side fallback. Same fallback chain pattern as the D197 allowlist + D199 expiry — server-side load in createShareLinkAction means even API-driven creation inherits the default. Saved fields land on tenant.settings-updated for audit.
Workspace-default share-link expiry days — pick 7 / 30 / 90 once instead of overriding the 14-day global on every link
Owner / admin sets a workspace-default share-link expiry (1-90 days) on /settings/tenant. Different firm postures want different defaults — 7 days for HIPAA delivery, 30 days for ediscovery rolling productions, 14 for general matter packages. When the operator doesn't pick an expiry on the share-link form, createShareLinkAction reads the tenant default first, falls back to the global 14-day default when the tenant didn't set one. Per-link entry still wins. Server-side fallback so even API-driven share-link creation inherits the default consistently. DB CHECK enforces 1-90 (same range as the per-link form). Saved fields land on tenant.settings-updated for audit.
Workspace-default share-link recipient domain allowlist — set @firm.com once instead of re-typing on every link
Owner / admin sets a tenant-wide default on /settings/tenant ("Default share-link recipient domain allowlist"). New share-links with recipient verification on inherit the default when the operator leaves the per-link "Restrict to email domains" field blank. Per-link entries still win — operators override or explicitly clear per link by typing into the field. Works server-side (in createShareLinkAction) so even API-driven share-link creation inherits the default — consistent across UI + API. 1000-char DB CHECK constraint (~50 typical domains). Saved fields land on tenant.settings-updated so the workspace default is auditable like every other tenant setting (D196 inline diff badge surfaces it as a chip on /audit).
Recipient email domain allowlist — pin verification to @firm.com so a personal gmail can't unlock a restricted production
Defense-in-depth on top of D181 verification. Optional textarea on the Share via link form (visible only when "Require recipient verification" is on) accepts comma- or newline-separated domains. Subdomains accepted (firm.com matches mail.firm.com — opposing counsel mailers commonly route through subdomain-pinned outbound services). Up to 25 entries per link. When a recipient enters an email whose domain isn't on the allowlist, verification refuses BEFORE emailing a code — they see a generic "not authorized to verify on this link" message to prevent probing for valid domains. The audit chain still captures the gate-trip with `reason: domain-not-allowed` so admins see refused attempts in the roster. /share-links/[id]/verifications surfaces an emerald "Domain-restricted" callout listing allowed domains so operators can prove pinned-to-@firm.com posture in audit. Useful for HIPAA-scoped deliveries (@providers.healthnet), restricted productions (@jonesandsmith.com only), or pinning to one tenant's domain.
Per-share-link access cap — "expire after first download" defense-in-depth on top of TTL + revoke
Tick "Cap total accesses to N opens" on the share-link form to hard-stop the link after a chosen number of accesses (1–1000). The standard single-shot ("expire after first download", cap=1) is one click; longer caps work identically. Both the share-page view and the download routes 404 once accessCount hits the cap. Stacks with the expiry-date TTL + manual revoke — whichever fires first wins. /share-links surfaces an "exhausted" badge alongside expired/revoked + an access counter rendered as "<used> / <cap>" so operators can triage at-a-glance. The cap is captured in the share-link.created event payload (not just on the mutable row) so the immutable audit chain proves "this link was minted with cap=1" even after revocation.
Share-link access notifications — chain-of-custody email when an external recipient opens your link
The litigator's "did opposing counsel actually receive this?" question, answered automatically. New "Email me when accessed" toggle on the Share via link form (default on). When the recipient opens the URL, Kodori emails the link's creator with the access timestamp + recipient hint + a manage URL — admissible evidence of when production was received for FRCP timeliness arguments. Throttled to one notification per (link, 4-hour window) so a recipient hitting the URL 50 times in an afternoon doesn't generate 50 emails. Driven by a new Inngest dispatcher subscribed to event/appended that filters to share-link.accessed events; the audit log still records every access regardless. The lastNotifiedAt stamp is written BEFORE the Resend send to prevent duplicate emails on a transient retry.
Internal download watermarking on confidential+ PDFs
Extends the share-link watermark (D157) to all internal PDF downloads of docs at sensitivity ≥ confidential. Three-layer stamp per page: workspace + tier in the header bar, diagonal CONFIDENTIAL / RESTRICTED / REGULATED stamp at 18% opacity ochre at the page center (text tracks the actual tier so a regulated doc reads visibly differently from a confidential one), footer reading "Downloaded by <email> on YYYY-MM-DD · Kodori". Closes the chain-of-custody gap where an internal user could download a regulated doc to their laptop without any visible mark — every screenshot or forwarded copy traces back to the originating download via the email + date. Gates on sensitivity ≥ confidential AND mime is PDF; public + internal docs keep the cheap 302-redirect-to-R2 path. Confidential+ traffic streams the watermarked bytes back through the route — cost is acceptable because it's a small share of total download volume but the share that matters for governance.
Confidentiality watermark on share-link PDFs — workspace + token + access date burned in
Every PDF served through a document or collection share-link is stamped on-the-fly with three layers: a workspace-name header bar, a diagonal CONFIDENTIAL center-stamp at 18% opacity ochre, and a footer reading "Confidential · Shared via Kodori on YYYY-MM-DD · Token <prefix>". A forwarded leak is traceable back to the originating share link via the token-prefix in the footer cross-referenced against the audit log's share-link.accessed events. Watermarking is on-the-fly because blobs are content-addressable (SHA-256 of the bytes IS the version) — forking a watermarked variant would either bloat storage or break the address. Production-kind share-links DO NOT get watermarked because they serve verbatim Bates-stamped bytes that must match the privilege log byte-for-byte for ediscovery integrity. Document + collection links DO because they carry no analogous integrity contract.
Customize the watermark text per workspace — "ATTORNEYS' EYES ONLY", firm name in the header
New "Share-link watermark text" form section on /settings/tenant. Owner / admin sets two optional fields: a custom diagonal stamp (auto-uppercased on render — drop in "ATTORNEYS' EYES ONLY", "PRIVILEGED & CONFIDENTIAL", "FOIA EXEMPT", "TRADE SECRET") and a custom header bar (firm name, matter prefix). Empty falls back to "CONFIDENTIAL" + workspace name. Each capped at 80 chars at the DB level (CHECK constraint, not just app validation). Footer (Kodori attribution + token prefix + date) is fixed — it's the load-bearing chain-of-custody anchor. Production share-links continue NOT receiving watermarks because production bytes must match the privilege log byte-for-byte. Both fields surface on the tenant.settings-updated event when changed so admins can review "who changed the watermark text and when" via /audit.
Matter binder export at /matter-binder — one-click compile to a single bookmarked PDF
Pick a source — collection (matter, custom folder) or recorded production — and Kodori merges every PDF in the source into a single binder with a cover page (matter name + doc count + Bates range + generated date) and a table of contents listing each doc with its Bates BEG/END + page count. New /api/matter-binder POST route handler uses pdf-lib to do the merge; the browser downloads with Content-Disposition. Permission-trimmed; non-PDFs skip silently. 500 doc / 200MB cap. Production-source preserves the EXACT version hashes captured at record time so a binder rebuilt from "Production Set 1 — Jan 15" delivers identical bytes even after later doc re-stamps.
/productions search + matter filter — find every production for matter Smith without scrolling
Firms with 30+ productions can't scroll-find their rolling sets. Text input filters across name + recipient + matterRef + bates prefix + bates range (substring, case-insensitive); matter dropdown auto-populated from distinct matterRefs in the result set. URL-state captured (?q=...&matter=...) so filter combinations are bookmarkable + middle-clickable. "Showing N of M" counter when filters are active. In-memory filter over the existing 200-row list cap (sub-millisecond). Same UX pattern as D194 /share-links search.
Production set tracker /productions — every Bates batch logged as a discovery production
Migration 0044 + new productions / production_documents tables. Each production captures recipient, matter, date, Bates range, document count, and the EXACT version hash of every doc delivered (so a later re-stamp on the same doc doesn't retroactively change the production record — "archived" badge surfaces in the per-prod table when current version diverges from produced). New recordProduction MCP tool (#62 in the catalog) — permission-trimmed, callable from agent / automations / external clients. New "Record as production" affordance on the /bates-stamp result table closes the loop between stamping and recording. New production.recorded audit event so webhook subscribers + automations can react ("when a production is recorded for matter Smith v Acme, post to my Slack channel"). The "what did we produce on Jan 15 to opposing counsel?" answer in seconds — the legal vertical's discovery production loop, end to end.
Production manifest CSV export — the standard ediscovery deliverable
"Download manifest CSV ↓" button on /productions/[id]. Streams one row per produced doc with documentId, batesBeg, batesEnd, pageCount, displayName, mimeType, sensitivityLabel, and the EXACT versionHash captured at production-recording time (NOT the current version — re-stamps + post-production redactions never silently change what the manifest says was delivered). Header preamble carries production name + matter ref + recipient + produced-at + Bates range + doc count as #-prefixed lines that ediscovery ingestion scripts (Concordance, Relativity) detect and skip. Sorted by Bates ranges (asc) so the manifest tracks the produced page sequence. One file, opens in Excel, parseable by every CSV ingestion script in the wild — no zip wrapper, no separate JSON sidecar.
Production set diff — what changed between Production N and M for rolling discovery
New /productions/diff page picks any two production sets and shows three sections: only-in-A (amber), in-both (neutral), only-in-B (emerald). Per-row Bates ranges from each side surface so operators verify continuity ("Production 1 ended at SMITH001234; Production 2 starts at SMITH001235 — clean carryover"). In-both rows flag versionChanged when the same documentId was produced with different versionHashes (re-stamped or re-redacted) — usually intentional but worth verifying against the privilege log. "Diff with another production →" link on every /productions/[id] page pre-fills the picker. Real ediscovery feature for FRCP rolling-production hygiene and opposing-counsel sanity checks.
Bulk legal hold — apply a hold to a whole collection or saved search in one click
New bulkAddDocumentsToLegalHold MCP tool + UI form on /legal-holds/[id]. Operator picks a source (collection or saved search) and clicks Bulk apply hold; Kodori resolves the source via the existing collection-members or runSavedSearchTool path, permission-trims via userCanReadDocument, pre-loads existing memberships so already-held docs are counted (not re-emitted), inserts new memberships in one transaction, and emits one legal-hold.applied event per newly-added doc — same audit shape as the per-doc tool. Result counts surface inline: newly added / already held / no permission / not found / total candidates. Automation-callable via mcp-tool-call action — write an event-triggered rule like "when a doc is filed in the Smith Matter collection, apply the Smith litigation hold."
Bates stamping batch at /bates-stamp — page-bottom numbers across an entire production
Pick a source (collection or saved search) + Bates prefix + start number; Kodori loads each PDF in alphabetical order, stamps the bottom-right of every page with a sequential Bates number via pdf-lib (thin 0.85-opacity white pad behind the text for legibility over dark page footers), and saves each result as a new immutable document version. Each doc gets a contiguous range [BEG, END]; the result table shows BEG/END/pages/doc-link per stamped doc + the final cursor. Pairs with /privilege-log v2 — same prefix + start across both surfaces makes log rows and produced PDFs reconcile exactly. Audit chain captures every stamp via document.version-committed with reason="bates-stamped".
PDF redaction tool at /doc/[id]/redact — draw boxes, burn to a new immutable version
PDF.js renders each page; a transparent overlay lets the operator click-and-drag rectangles. Each saved box persists immediately to the new document_redactions table; a × button removes any box. "Burn redactions to new version" uses pdf-lib to overlay opaque black rectangles, flatten, and create a new immutable document version pointing back to the original via previousHash. Audit chain captures every add / remove / burn event with the box list as payload — a future investigator can reproduce "what was redacted" without ever recovering "what was behind." Resolution-independent coordinates (PDF user-space units, not client pixels) so re-renders at any zoom align. Pairs with privilege log v2 for the end-to-end discovery production workflow: classify what to withhold → redact what's partially privileged → produce.
Smart redaction suggestions — Haiku-driven privacy-scan checklist
Click "Privacy scan ✨" on the redact surface and Kodori loads the document's extracted text, runs Haiku via the existing model provider with a structured-output prompt, and returns a checklist of up to 40 redaction candidates across 12 categories (us-ssn, credit-card, bank-account, phone-number, email-address, date-of-birth, street-address, medical-record-number, attorney-client-privileged, attorney-work-product, trade-secret, other-pii). Each card surfaces a verbatim snippet, one-sentence reasoning, confidence band (high/medium/low), and page number when preserved. Color-graded chips by category for fast scan-and-triage; per-card dismiss for false positives. Operator uses the list as a checklist while drawing boxes manually — auto-draw is deferred because extracted text doesn't carry per-character coordinates. Cost: ~$0.001/scan via Haiku 4.5. 60k char input cap covers a 100-page deposition transcript.
Conflict checking on matter creation — debounced hybrid search surfaces overlapping engagements
New previewMatterConflicts MCP tool (#67) runs two passes: NAME match (substring against existing collection names) and DOCUMENT match (hybrid search on name + description, then each hit maps back to its collection memberships). /collections/new debounces the check (600ms) as the operator types when kind=matter; results surface inline in an amber warning panel with click-through to suspect matters + per-doc snippets. Submit button refuses unless the operator explicitly confirms "I've reviewed the conflicts above and want to proceed anyway." Replaces the manual paralegal-runs-conflict-search pre-engagement step every law firm does. Operator-callable from the agent via natural language ("check this matter for conflicts").
Privilege log builder at /privilege-log — FRCP-26 log in seconds, with saved-search source + inline overrides
Pick a SOURCE — collection OR saved search — and Kodori generates an FRCP-26-compliant privilege log: Bates / Date / Author / Recipients / Doc Type / Privilege Basis / Description. Privilege basis classified from a constrained enum (Attorney-Client, Attorney Work Product, Common Interest, Joint Defense, Settlement Negotiations, Not Privileged, Unable to Determine); description is a 1-2 sentence FRCP-26-style line that names subject + parties + date without reproducing privileged content. Auto-sequenced Bates numbers (configurable prefix + start number, 6-digit padded). Permission-trimmed before classification — docs the requester can't read skip silently. PER-ROW INLINE EDIT — every row has an edit link that opens form fields for Bates / basis / description; save persists to a per-row override table keyed by (sourceKind, sourceId, documentId) so re-builds preserve corrections without re-classifying. "edited" badges surface modified rows; clear-override reverts to the latest Haiku output. Markdown export pastes cleanly into Word / ECF / production cover letters. Replaces the ~8-12 hour senior-paralegal workflow standard on every discovery production.
AEC project lifecycle metadata — owner / contract value / target completion / status on /projects/[ref]
Migration 0046 adds an optional aec_projects table keyed by (tenantId, projectRefKey). Per-project drill-in renders a header card with owner / contract value / target completion / status / notes; inline Edit form persists via upsertAecProjectAction with ON CONFLICT merge. The /projects rollup continues to derive from the three tracker projection tables — metadata is OPTIONAL enrichment, not a replacement. Operators can pre-create metadata before any docs land ("we just won the bid, set up the project before the RFIs"). Status enum: active / on-hold / closed. Currency: USD / CAD / EUR / GBP / AUD. Contract value stored as bigint cents.
AEC /projects dashboard + per-project drill-in — every active job's health on one screen
Sister to /spec-sections, pivoted by project reference instead of CSI section. Per row: open / total RFIs (with overdue + due-soon tinting), under-review / total submittals, pending / executed / rejected COs with executed-vs-pipeline cost impact in dollars, schedule-day aggregates (executed vs pipeline). Click any row to drill in to /projects/<ref> — a per-project view with sectioned open RFIs / under-review submittals / pending change orders / executed COs / rejected COs and a unified recent-activity timeline mixing all three artifact types. /rfi-tracker, /submittal-tracker, and /change-order-tracker also accept ?projectRef= query-string filters with an ochre "Filtered to project · clear filter" pill rendered inline. The strategic morning view a project executive runs without drilling into individual trackers; the per-job deep dive when a specific project needs attention. Pure read-side aggregation over the three projection tables — no project schema, no migration.
RFI + submittal response packet linking — "Mark answered" button + linkResponseDocument MCP tool
Closes the v2 follow-on noted in D107 + D108. New linkResponseDocument MCP tool (#63) handles both RFI and submittal kinds; takes the subject doc id + response doc id + optional status. New inline "Mark answered" picker on /rfi-tracker and /submittal-tracker rows — operator types a search query, hybrid search runs on the workspace, picks the response doc, sets status (RFI: answered/rejected; submittal: approved/rejected with optional verbatim disposition), confirm. Permission-trimmed (read required on BOTH docs); audit-logged via document.metadata-set events. Automation-callable — write rules like "when a doc matching pattern X is filed, find the matching open RFI by number and link it."
RFI structured spec_section — typed CSI MasterFormat column on the rfis projection
Migration 0047 + extractor update. The RFI extractor (extract-rfi.ts) now isolates the CSI MasterFormat spec section from the more general `location` field and stores it in a typed column. /spec-sections joins RFIs by the typed column when populated, falls back to location-substring heuristic for pre-D130 rows. Tighter matching means fewer false positives (a location like "Sheet A-201" no longer accidentally matches a section like "201"). Closes the first revisit trigger from D108 + D114.
AEC /spec-sections directory — project heatmap by CSI MasterFormat
Joins RFIs, submittals, and change orders by spec section into one strategic view. Per row: open RFIs (matched by location text containing the section number), under-review submittals, pending / executed / rejected COs, executed cost impact (red for additive, emerald for credits), pipeline cost impact. Sections sort hottest-first (most open work across all three artifacts). Overdue badges surface on any section with overdue RFIs, overdue submittals, or PCOs past the 14-day signature threshold. Whitespace-tolerant section matching ("08 41 13" and "084113" aggregate into one row). The Monday-morning project-manager view that no spreadsheet maintains itself.
Bulk operations menu at /bulk-ops — three new MCP tools, source-driven (collection or saved search)
Three new bulk MCP tools (#64-66 in the catalog): bulkAddDocumentsToCollection, bulkSetDocumentRetentionClass, bulkSetDocumentSensitivity. All follow the D125 shape — discriminated-union source, permission-trim, idempotent, one audit event per affected doc. Sensitivity tool enforces held-doc downgrade refusal (deny-wins). New /bulk-ops page surfaces them in a three-tab operation switcher with inline result counts ("Added 47 of 50 candidates. 2 already members; 1 no permission"). Automation-callable via mcp-tool-call action — operators can write event-triggered rules like "when DLP flags a doc with high confidence, set its sensitivity to confidential across the matching collection." Cursor-paginated at 100M-doc-tenant scale (D283) — the 2,000-per-call safety floor stays unchanged; the UI / agent loops calls until nextCursor === null to drain a source set larger than the cap.
Bulk source: `kind: "uncollected"` — pin every loose doc to a collection in one tool call
New built-in source variant on every bulk-doc tool: bulkAddDocumentsToCollection / -RetentionClass / -Sensitivity / -LegalHold. Carries no fields — `kind: "uncollected"` IS the whole query. Resolves via Postgres `NOT EXISTS` to every live tenant doc with no row in collection_members. Pinned-only definition of "collected" (rule-derived membership is recomputed at read time and doesn't qualify). Permission-trimmed downstream. The agent's natural-language entry: "pin every uncollected doc to RoyzTestDocs", "the inbox", "loose docs", "docs not in any collection" all map directly to this variant — no first-look-up-saved-search round trip needed. Single SQL primitive, single tool call.
AEC inspection / daily-report tracker at /inspection-tracker
Fourth AEC daily-action surface alongside RFIs / submittals / change-orders. Auto-classifier flags "inspection report" / "daily report" / "punch list" docs (already in doc-type-hints); migration 0048 + new extract-inspection Inngest function pull inspector / inspection date / location / spec section / trade / verbatim result / open-finding count / 1-2 sentence findings summary via Haiku. Page sorts open-with-findings first; top-line stats include total open findings; one-line "by trade" breakdown ("MEP 27 · structural 4") shows where the issues live without drilling. Same ?projectRef= filter pattern as the other three AEC trackers.
AEC change-order tracker — every PCO and CO with cost + schedule impact at /change-order-tracker
Third AEC daily-action surface alongside RFIs and submittals. Auto-classifier flags change orders / PCOs / construction change directives; a Haiku-backed extractor pulls CO number, subject, project ref, spec section, originator, approver, signed cost impact (additive vs deductive), schedule days, reason category, dates, signature status. Page surfaces five status counters (Pending / PCO / Overdue signature / Executed / Rejected), executed-vs-pipeline cost aggregates with red for additive and emerald for credits, schedule-day aggregates, top reason categories ("owner request" / "design change" / "field condition" / "RFI response" / etc.), and overdue-PCO highlighting at 14 days. Replaces the manual "change order log" spreadsheet — every PCO and CO in the workspace surfaces here automatically.
AEC submittal tracker — product + material approvals at /submittal-tracker
Sister surface to /rfi-tracker for the second-highest-volume AEC document. When the auto-classifier flags a doc as submittal / shop drawing / product data sheet / material sample / mock-up / test report, a dedicated Inngest extractor pulls structured fields (submittal number, subject, CSI MasterFormat spec section, type, parties, dates, disposition). Five status counters + top-spec-sections-by-pending-count breakdown + per-row detail. Disposition wording preserved verbatim — firms don't lose canonical phrasing like "Approved as noted — rev. mullion finish to RAL 9006" while the derived status bucket drives the queue.
AEC RFI tracker — Request For Information workflow at /rfi-tracker
AEC vertical's equivalent of /ap-review. When the auto-classifier flags a doc as RFI / Request For Information, a dedicated Inngest extractor pulls structured fields (RFI number, subject, project ref, requested by / of, location, due date). Row lands on /rfi-tracker with a morning-stand-up view of open + overdue + due-soon counts plus top-projects-by-open-count breakdown. Open RFIs past their due date highlight in red — the "what's blocking the project right now" view. Phase 4 vertical work pulled forward — RFIs are the highest-volume document on a typical commercial construction project (200-2000 per build).
Drawing register — sheet extraction + per-project rollup
Click "Extract drawings" on any drawing PDF and Kodori scans the text against AIA/CSI sheet-numbering patterns (A-201, S-101, M3.05, etc.) covering 14 disciplines (architectural, structural, MEP, fire, civil, landscape, telecom, equipment, security, demolition, general). Per-doc panel surfaces the discipline-grouped index with sheet titles + occurrence counts. Per-project rollup at /collections/[id]/drawing-register aggregates every readable doc's sheets across the project, grouped by discipline, with traceable source documents per sheet. The "drawing register" output every AEC project requires — built from the same regex-driven, deterministic extraction surface that powers per-doc indexing. Permission-trimmed; idempotent on re-run.
Drawing-set integrity check — expected ranges + missing/unexpected diff
On every project drawing register, define expected sheet ranges per discipline ("A-100 to A-501", "S-100 to S-308") and Kodori cross-checks against the indexed sheets. Surfaces: per-range expected/found counts + missing-major list ("Missing: A-205, A-206 …"), top-of-section completeness chip color-coded green/amber/red, plus an UNEXPECTED sheets warning panel for sheets in docs that fall outside every defined range. Range coverage is majors-only — M-1 to M-99 covers M3.05. Audit-logged via project-drawing-range.added/.removed. Answers the project-closeout question "do we have every sheet on the contract drawing list?" without scrolling 800 rows.
AP-invoice review with three-way match + line-item reconcile
Upload an invoice and Kodori extracts vendor / invoice number / total / PO number AND every line item (description, item code, quantity, unit price, total per line). Scans the workspace for a matching PO doc + receipt sharing that PO. Computes match status (3-way matched / price-variance / awaiting receipt / 2-way only) with signed variance in cents at the document level, plus per-line pairing (item-code match → exact-description match → line-number fallback) with per-line ✓ matched / ! variant / unpaired flags. Surfaces the line-level posture even when document-level totals happen to reconcile (e.g. a vendor who billed twice for one item but waived another). Tolerance: max($5, 1%) at the document level, $1 per line. Late-arriving receipts retroactively reconcile their invoices in place. Approval / rejection emits a webhook-deliverable event for downstream ERP sync; the audit chain is the system of record either way.
Compliance overview page
/compliance shows live records, active holds, held documents, retention queue depth, sensitivity histogram, retention class count, and the audit chain tip. Single pane for governance posture.
Per-document history timeline
Every event recorded against a single document, newest first, payload-aware summaries (sensitivity transitions, version commits, hold matter refs, retention deferrals). User actorIds resolve to email; agent / system actors render with their full identity string.
Pillar 06
AI as the operations layer
The agent isn't a chatbot bolted onto a DMS — it's the same MCP tool catalog the UI calls. Auto-classification, sharing decisions, retention deferrals, holds: all the same typed tools, just invoked by different principals.
AI document generation from templates
Click "Use as template →" on any Kodori document with extracted text, type instructions ("Draft a 1-year mutual NDA between Acme and BigCo, NY law"), and Kodori asks Claude Opus to draft a new document that follows the template's structure but adapts to the new context. Result lands as a new Kodori record (markdown — opens in any editor, pastes into Word with formatting preserved). Bracketed placeholders flag where reviewers need to fill in unspecified values; metadata links the draft to its template so the audit chain captures the lineage. The single biggest "wow factor" no incumbent DMS has — Kodori treats documents as data the agent can compose new work from, not just files to open in Word.
Conversational canvas — branchable agent runs with human-approve gates (/canvas)
Multi-step agent workflows render as a tree of nodes (tool-call / human-approve / branch / reduce) instead of a flat chat transcript. Spin up a canvas run, the agent advances tool-by-tool, and any node can be flagged human-approve so an admin must Approve / Reject before the run continues. Reject collapses the descendant subtree without rolling back upstream side-effects (existing reversibility primitives still apply at the tool level). Each node + status flip lands on the hash-chained audit log via 8 new event types — `canvas-run.started / .advanced / .paused / .resumed / .approved / .rejected / .completed / .cancelled`. /canvas list + /canvas/[id] detail with per-node Approve / Reject buttons.
Document templates — mark + clone at zero blob cost
Mark any live document as a template via the toggle on /doc/[id]. Templates appear on /templates with an inline "New from template" form per row: enter the new doc name + (optional) target collection + Create. The new doc forks at the SAME currentVersionHash as the template — content-addressable storage means zero blob duplication, infinite templates cost the same as a single uploaded doc. Sensitivity tier + mime type carry forward; metadata gets a fresh empty payload tagged with `clonedFrom` so lineage is queryable. When a target collection is picked, the membership writes in the same transaction so the new doc lands inside the matter the moment the redirect resolves. Three new event types — document.marked-as-template / .unmarked-as-template / .cloned-from-template — track the full lifecycle on the audit chain.
Smart automations — programmable agent in plain English (scheduled OR event-based, AND/OR/NOT filters, four action kinds, ${event.payload.*} variables)
Type a one-line rule on /automations. Claude Opus compiles it into a typed trigger + action. SCHEDULED rules fire on an Inngest cron tick every 5 minutes; EVENT rules fire within seconds of a matching audit event — optionally narrowed by filter conditions ("when an anomaly is detected with severity above medium, ping my Slack webhook" → eventType=anomaly.detected + filter=[{path:"payload.severity", op:"in", value:["high","critical"]}]). Filter expressions support 8 ops (eq/neq/in/nin/gt/gte/lt/lte/contains), dotted-path access into payload + top-level event columns, and AND-only composition. FOUR action kinds: email-saved-search, email-agent-query, webhook (Slack-aware JSON POST), mcp-tool-call (invoke any of the 165+ typed tools with compile-time Zod validation). VARIABLE SUBSTITUTION (`${event.payload.<field>}`, `${event.eventType}`, `${event.actorId}`, `${automation.name}`, `${trigger.firedAt}`) inside action configs lets one rule cover thousands of fires. Combined with event triggers + filters + tool catalog + variable resolution this is "Zapier inside the DMS" — programmable agent + audit-defensible execution + every typed tool addressable + precise per-event narrowing. No incumbent has the architecture to ship this. Permission-trimmed — automations run as the creator. Run-now button to test before trusting the trigger.
Slide-in agent drawer with ⌘K / Ctrl+K hotkey
Press ⌘K on macOS or Ctrl+K on Windows / Linux anywhere in the app — the drawer slides in from the right with the page you're on already threaded into the prompt. "What is this document about?" works without naming the file. Click Expand to fill the content area without losing your sidebar nav.
Persistent conversation history per user
Every chat is saved server-side. The expanded drawer shows a History rail listing recent conversations newest-first; click any to switch threads, hover to delete, "+ New" to start fresh. Per-user (not per-tenant) — privileged matter queries stay private to the asker, and per-doc permission trimming still applies inside any conversation regardless of who hydrates it.
Full-text search across past conversations
A search input above the History rail runs Postgres FTS (websearch_to_tsquery + ts_rank_cd) over every message you've authored or received. Type "smith engagement letter" or "retention class" or `report -draft` and Kodori returns ranked threads with the matched terms highlighted in context. Empty the input to fall back to the recent-first list.
Per-conversation system prompt overrides
Pin a custom preamble to one thread — "drafting agent" / "research agent" / firm-style — that travels with every turn alongside the base system prompt. Three template chips seed the editor; 4000-character cap keeps the always-on prepend within prompt-cache breakpoint budget. Pins are per-conversation, per-user, opt-in.
Workspace-introspection toolset (13 tools)
The agent answers "where do I stand" questions with one tool call: tenantUsageSummary (storage, doc count, agent quota, plan vs caps), listMembers (workspace + roles + sign-in providers), recentActivity (audit events with actor labels and counts by type), listLegalHolds, listRetentionClasses, listRetentionReviewQueue, listDocumentVersions, listDocumentEvents, getDocumentExtraction (extraction status + extractor + error), listAnomalies, listDlpFindings (permission-trimmed; pre-redacted previews only), listSavedSearches, runSavedSearch, listApiKeys (no plaintext), listWebhooks (24h delivery health), getTenantSettings, verifyAuditChain. Permission-trim applies throughout.
Citation chips — doc names, not UUIDs
When the agent mentions a document, the link renders as a permission-trimmed chip showing the document's display name ("DOC Smith v Acme NDA") instead of a raw UUID. The cache is module-scoped so 20 chips for the same conversation = 1 server-action call. UUIDs the user can't read fall back to a truncated label without leaking metadata about doc existence.
Usage strip in the drawer footer
Live "Questions 12 / 600 · Opus 3 / 120 · Team →" line with the same amber-at-80% / red-at-100% bands the dashboard uses. Click through to /settings/billing to manage the plan. Hidden when usage is unlimited (Enterprise).
Page-context awareness
Document, collection, search, and route context all pipe into the agent's system prompt. On a doc page, the excerpt and id are right there; on a Collection, the first pinned members; on /search, the active query.
Help-knowledge retrieval
The agent has a helpKnowledge tool that searches this knowledge base. Ask "how do I share a doc?" or "what is a legal hold?" and you get a grounded answer with a /help/<slug> citation, not a hallucination.
Same MCP tool catalog as the UI
The agent calls hybridSearch, addDocumentToCollection, applyLegalHold, deferRetention — the exact same typed tools that back the UI buttons. Permissions enforce identically; audit events fire identically.
Move, rename, retag, delete, restore — by asking
Say "rename this to Smith NDA — final", "move this into Matter 24-1234", "regulated, please, this has SSNs in it", or "delete this draft" and the agent dispatches the matching MCP tool with your reason captured on the audit log. tombstoneDocument refuses on active holds. setDocumentSensitivity refuses to lower a tier on held records. restoreDocument is owner/admin-only. The agent will not invent a reason — it asks first for any consequential action.
Hash-chained agent transactions
Every tool the agent calls writes an event with actorKind="agent". The audit log distinguishes user from agent actors, so you can answer "what did the AI do this week?" with one query.
Agent activity page — discoverable by default
A dedicated /agent-activity route surfaces every action the AI has taken in the workspace over the last 30 days, with stats, a tools-by-volume chart, and a recent timeline. No audit-filter syntax to learn — managers see the right scope by default and link out to /audit for deeper queries.
Anthropic Claude with prompt caching
Opus 4.6 for reasoning, Haiku 4.5 for high-volume classification. Prompt caching enabled on every reused-context call. Provider abstraction in packages/agent/src/provider.ts is the swap point for AWS Bedrock when an enterprise BAA requires it.
Pillar 07
Integrate, extend, ship
Kodori is built on the same MCP tool catalog the agent uses. The public API covers read AND write — your integration ships against the same surface our UI does.
Public REST API — read + write, twelve endpoints
GET /api/v1/me, POST /api/v1/search, GET /api/v1/documents, GET /api/v1/documents/{id}, GET /api/v1/collections, plus mutating routes: rename, sensitivity (refuses to lower on held docs), metadata patch, tombstone, restore, create collection, add member, remove member. Bearer-token auth. Permission-trimmed at the index — your integration sees what the issuing user sees, no more.
Opt-in scopes per key
search:read is the always-granted baseline. Add documents:write, documents:delete, or collections:write at issuance time — each opt-in. A leaked search-only key has a much smaller blast radius than a leaked all-writes key, and Kodori makes that distinction explicit.
OpenAPI 3.1 manifest at /api/openapi.json
Drop into Postman, Insomnia, or Stoplight Studio for typed request building. Documents auth, request schemas, response shapes, error envelope, and the held-doc deny-wins error path.
Bulk ingest API — POST raw bytes, no presigned dance
POST /api/v1/documents with the file as the request body and metadata in X-Kodori-* headers. 50 MB cap, content-addressed dedup is free, full extraction + classification pipeline runs the same as a UI upload. Built for watch-folder daemons, ETL jobs, and scanner integrations.
Webhooks for push notifications
Subscribe an HTTPS URL to event types you care about. Kodori POSTs a signed payload (HMAC-SHA256 over <timestamp>.<body>) whenever a matching event lands. /webhooks page lists active subscriptions, paused ones, and the last 50 deliveries with response codes + retry counts. Subscribe to everything for an audit-export integration; filter to specific types for a workflow integration.
Per-subscription delivery audit + redeliver
/webhooks/[id]/deliveries drills into one subscription with full pagination, top-of-page success-rate stats (color-coded), All / Succeeded / Failed filter chips, and a "Redeliver" button on every row. After you fix a receiver, redeliver re-fires the same (event × subscription) match through Inngest without recreating the source event — the audit log stays single-truth, only a fresh delivery attempt lands. Refused on paused / revoked subscriptions (resume first). Same auth bar as create / pause / revoke (admin only).
Per-subscription retry policy — tune retries from 1-10 per receiver
Default 4 (matches the prior global behavior). Use a lower value (e.g. 2) for known-flaky downstreams where excess retries DOS them; use a higher value (e.g. 8) for critical receivers where you want maximal effort to deliver. Inngest function-level cap is 10 (the absolute ceiling); per-sub max_retry_attempts is the operator-controllable knob below it. The deliver function reads the per-sub cap on each retry and self-aborts with a recorded failed-delivery row when attempts hit the cap, so Inngest stops scheduling more retries naturally. Saving emits webhook.retry-policy-set on the audit chain with previous + new values; takes effect on the next event delivery. Idempotent on no-op saves so the audit chain stays clean.
Webhook test-fire — admins verify their endpoint without writing a real mutation
Click "Send test" on any active subscription on /webhooks. Kodori appends a synthetic webhook.test event scoped to a per-subscription stream (`webhook/<id>/test`) and direct-dispatches the deliver function at that subscription only — bypassing fanout so other subs whose eventTypes filter accepts webhook.test don't receive the test (clean isolation). Receiver gets a real signed POST with full HMAC headers + a payload containing isTest: true, initiator email, the URL, and an explanatory message. Same code path as production delivery — passing test = real deliveries work. webhook.test event lands on the audit chain so compliance can verify "admin tested webhook X at timestamp Y" without a separate test log. Refuses on paused / revoked subscriptions ("Resume it first") to prevent accidental re-arming.
Bulk-redeliver failed webhook deliveries — replay an entire outage in one click
After a receiver outage, click "Redeliver all failed" on /webhooks/[id]/deliveries instead of clicking 50 individual Redeliver buttons. Window dropdown (last 24h / 7d / 14d / 30d) bounds the replay; selectDistinct on eventId means multiple failed retries of the same event collapse into one redelivery (no fan-out from a heavily-retried event). Hard cap of 500 unique events per call — narrow the window if you have more, you almost certainly have a misconfigured receiver rather than a transient outage. Same Inngest deliver-function code path as single-row redeliver — same retry cap, same audit posture. Refuses on paused / revoked subs ("Resume it first") to match the explicit-action-required UX from webhook test-fire (D189).
Webhook 7-day delivery health inline on /webhooks — triage receiver health at-a-glance
Every active subscription renders "7d: 94% · 23 deliveries · last fail 3h ago" in the row alongside the existing last-delivery + Deliveries → link. Color-coded by tone — green when success ≥ 95% with no recent failures, amber when 80-95% OR a failure in the last 4h on an otherwise-healthy sub, red when < 80%. "no deliveries" stays neutral gray (silence isn't failure). Single grouped SQL query keyed on subscriptionId pulls total + failed + lastFailedAt over the last 7 days; tenants with zero active subs skip the query. Closes the "did my receiver start failing?" question without drilling into per-sub deliveries logs. Hover the badge for raw counts.
Webhook signature verification
Every delivery carries X-Kodori-Signature: sha256=<hex>, X-Kodori-Timestamp, X-Kodori-Event, X-Kodori-Event-Id, X-Kodori-Subscription. Receivers recompute the HMAC over the timestamp + body and reject mismatches or > 5-minute timestamp drift. Failed deliveries auto-retry up to 4 times via Inngest step retries.
Per-API-key usage audit + daily activity chart
/api-keys/[id]/usage drills into one key with total all-time calls, last-30-days call count, last-used stamp, a 30-day daily activity bar chart, the 10 most-frequent event types with share-of-traffic percentages, and a 30-row paginated recent-calls list. Backed by the existing audit log — every external API call lands with actorId="apikey:<prefix>" so the data is always there, hash-chained alongside every other tenant action. Pairs with the webhook delivery audit panel for end-to-end integration triage.
At-a-glance API key usage stats on /api-keys — answer "is this in use?" in one row
Every /api-keys row now shows "X calls · last call <relative>" — a lifetime counter (bumped best-effort on every accepted auth) plus a relative-time stamp ("2m ago" / "3h ago" / "5d ago" / YYYY-MM-DD past 30 days). UTC-stable rendering (absolute deltas, not localized weekdays) so SSR + CSR match. Atomic SQL `+ 1` increment so concurrent calls don't race-lose updates. A key with "no calls yet · created 32 days ago" is a safe-to-revoke candidate; one with "12,847 calls · last call 2m ago" is load-bearing. Bigint counter type — effectively unbounded. Cleanup ergonomics without grepping audit; the per-key /usage drill-down still exists for compliance / forensic detail.
Bulk-revoke API keys for offboarding — one click, every key the departing user created
"Offboarding — bulk-revoke all keys created by a user" form on /api-keys (admin / owner only). Pick a workspace user from a dropdown and Kodori revokes every active key they created in one action. Each revoke emits api-key.revoked on the audit chain with bulk: true + offboardedUserId + offboardedUserEmail in the payload — distinguishable from individual revokes for offboarding-sweep queries. Refuses to revoke the caller's own keys (server-side gate, not just UI). Closes the audit-chain gap where single-row revoke previously silently flipped revoked_at without emitting any event — single-row revoke now also emits api-key.revoked.
API key expiration + rotation reminders — SOC 2-grade key hygiene
Set explicit expiration on key creation (Never / 30 / 90 / 180 / 365 days, capped at 365) or edit on an existing key from /api-keys/[id]/usage. Auth path refuses expired keys with a 401 (reason=key-expired) — auto-revocation without a separate cron flip. Daily Inngest sweep at 03:00 UTC finds active keys within 7 days of expiry and emails workspace owners + admins with the key name + prefix + expiry date + manage URL; throttled to one email per ~6 days inside the window so the same key doesn't spam your inbox. Three new event types — api-key.expiration-set / .expiration-warning-sent / .expired — landing on the hash-chained audit log so "every key that expired this quarter" is one /audit query. Per-row expiration badge on /api-keys: gray "expires YYYY-MM-DD", amber "expires in 5d", red "expired 3d ago".
Workspace settings — owner-editable operational defaults
Single admin page at /settings/tenant for four operational defaults that previously required support intervention: per-tenant default API rate limit (resolution chain: per-key override → per-tenant default → 600 rpm global floor), default retention class auto-applied to new uploads, custom intro paragraph rendered on the compliance evidence-packet cover page (per-firm "this packet was generated for ..." text), and digest reply-to override so activity-digest replies route to the firm's inbound triage inbox instead of hello@kodori.ai. Every field is optional — null falls back to the platform default. Each save emits tenant.settings-updated on the audit chain with a delta of what changed.
Per-API-key rate limiting with standard 429 envelope
Every API key carries a per-minute request cap (default 600 rpm = 10 rps; per-key overrides via api_keys.requests_per_minute). Successful responses carry the standard X-RateLimit-Limit / X-RateLimit-Remaining / X-RateLimit-Reset headers so integrators always see their remaining quota. Exceeding the cap returns 429 with Retry-After — same shape Stripe and GitHub return, so existing client libraries that respect Retry-After work without changes. Atomic per-key counter via Postgres row-level locking on ON CONFLICT DO UPDATE — the cap is global per-key across multi-instance deployments.
API key + webhook issuance with one-shot reveal
Mint keys at /api-keys and webhook secrets at /webhooks. The plaintext is shown exactly once at creation; only the prefix remains visible afterward. sha256 hash stored at rest; constant-time compare on verify. Distinct prefix shapes (k_… vs whsec_…) so a leaked one can't be confused with the other.
Internal MCP tool catalog (same as agent)
Every read and write operation is a typed MCP tool with a Zod input schema. Tools are the unit of audit, the unit of permission, the unit of agent capability. The internal MCP server is the foundation; the public API and the agent are both thin shims over it.
Public MCP server at /api/mcp
JSON-RPC 2.0 over Streamable HTTP per the 2025-11-25 MCP specification. Any conformant client (Claude Desktop, Cursor, ChatGPT desktop, Kodokyo's agent, custom integrations) connects with a Kodori API key and gets the entire 75+ tool catalog. Read scope sees read tools; write scopes light up writes; delete scope adds tombstone. Every call emits an audit event with actorKind=agent and actorId=key-issuer, so /audit?actor=apikey:<prefix> shows exactly what an external client did.
Official TypeScript SDK on npm
@kumokodo/kodori-sdk wraps the public REST surface and provides MCP connection helpers. Typed methods for every endpoint (kodori.documents.rename, kodori.search.run, etc.); KodoriApiError carries machine-readable code + details for refusals; mcp helpers generate the exact JSON snippet for Claude Desktop, Cursor, or any custom MCP client. ESM-only, Node 18+ / Bun / Deno / Cloudflare Workers. Pairs with Anthropic's @modelcontextprotocol/sdk for the full client pattern.
Webhook → Slack Block Kit
Pick "Slack" as the format on a webhook subscription and Kodori renders each event as Block Kit at the destination URL. Pair the format with the event-type filter and your /compliance channel sees readable messages for legal-hold.applied, document.dlp-flagged, anomaly.detected, plus their resolution events — no glue code, no lambda translator.
External read-connectors — 6/6 vendors live: Slack, Gmail, Outlook, SharePoint, OneDrive, Google Drive (/integrations)
All six vendor kinds ship E2E as of v0.7.33 — OAuth + per-vendor incremental sync + recurring sync cron (every 30 min) + searchable from the agent. Slack uses bot-token OAuth v2 with per-channel oldest-ts cursors; the Microsoft trio shares a single OAuth dance and uses Graph delta-cursor sync; Gmail uses users.history.list delta + per-message body extraction; Google Drive uses changes.list delta + page-token cursors. Tokens persist encrypted at rest (AES-256-GCM with scrypt-derived key from AUTH_SECRET — extends to BYO-KMS when tenant-key SDK integration lands). Lifecycle event types: `external-connector.created / -paused / -resumed / -revoked / -sync-completed / -sync-failed`. Pause / Resume / Revoke + Sync now per connector at /integrations/[id].
Outlook + Gmail attachments — emailed PDFs, decks, contracts surface alongside the message body
outlookSyncWorker fans out per-message Graph attachments calls for messages with hasAttachments=true (50-message-per-run cap), filters to fileAttachment kind (skips inline images + itemAttachment + referenceAttachment to avoid dedup with their actual sources). gmailSyncWorker walks payload.parts for binary attachments using the existing format=full payload tree (no extra API calls). Both byte-fetchers honor the 50MB cap matching Kodori upload limits. After this, all 6 connector kinds emit document rows for file content — searchExternalContent answers "find every contract attached anywhere this quarter" with one MCP call.
Slack file attachments — PDFs, contracts, decks shared in channels surface alongside messages
Slack connector now walks `files.list` per sync run and persists file uploads as external_documents. Bytes get downloaded via `url_private_download` (Bearer bot-token) and run through the existing extractor cascade (Azure Doc Intel → Office adapters → Whisper → DocAI → Claude PDF → builtin-text). After extraction, a Slack-shared PDF contract is searchable on its body content via `searchExternalContent` — not just on the channel message that introduced it. `files:read` scope is additive so existing Slack connectors keep working; click Re-authorize on /integrations to pick up the new capability.
Per-connector sync cadence — 5 min for active workspaces, 24 hours for archives
Default is 30 min globally. Per-row override on /integrations/[id] tunes from 5 min (active sales Slack, executive Outlook) up to 24 hours (legacy SharePoint archives). Lower bound 5 min protects you from accidentally rate-limiting your Slack bot. Lands on the hash-chained audit log via `external-connector.cadence-set` so "who set this connector to 5-min sync" has an answer.
Extraction retry sweep — recovers transient outages within 6 hours
Inngest cron fires every 6 hours, sweeps external_documents WHERE extracted_at IS NULL AND (extraction_error IS NOT NULL OR queued > 1h ago), re-fires the extraction event. Recovers from Anthropic outages, Azure rate-limit storms, Inngest dispatch loss — all the transient causes that step-retry alone misses. Permanent failures (oversize, unsupported mime, 404) still stick to the row but get one cheap retry every 6h before stopping. Capped at 200 rows per tick so a 5000-row backfill recovery doesn't saturate Inngest.
Connector file text extraction — searchable on body content, not just filename
When a file from SharePoint, OneDrive, or Google Drive syncs in, the bytes get downloaded and run through the same extractor cascade Kodori-native uploads use (Azure Doc Intel → Office adapters → Whisper → Google DocAI → Claude PDF → builtin-text). Result: extracted plain text lands on the external_documents row + the searchExternalContent FTS index picks up body matches, not just filename matches. Google-native Docs/Sheets/Slides export via Drive's /export endpoint. 50MB byte cap matches the Kodori upload limit; 2MB stored-text cap (covers ~500 pages dense prose) keeps storage bounded. Concurrency-keyed on documentId so two parallel ticks don't double-spend the LLM call.
Recurring connector sync — automated freshness without operator clicks
Single Inngest cron at `*/30 * * * *` selects every connected connector with `lastSyncCompletedAt < now-25min` and fans out one sync event per row (capped at 200 per tick). Manual "Sync now" + cron tick serialize cleanly per connector via concurrency.key=connectorId, so a human-triggered sync mid-tick doesn't race the cron. Every cursor advance + sync completion lands on the audit chain via the existing `external-connector.sync-completed / -failed` event types.
unifiedSearch — agent retrieves across Kodori docs + every connector in one MCP call
New typed MCP tool fires hybridSearch (Kodori-native docs) + searchExternalContent (connector content) in parallel and fuses the three ranked lists (Kodori-document + external-message + external-document) via Reciprocal Rank Fusion (k=60). Each hit carries a `kind` discriminator and `vendor` tag for optional grouping in the answer; default is ranking-by-best-overall. The agent reaches for unifiedSearch on "find every relevant thing about X" queries when the user hasn't named a specific source — closes the multi-call overhead from the parallel-fire pattern (D236).
Public REST + SDK 0.2.0 expose connector content programmatically
Two new public REST endpoints under the existing `search:read` scope: `POST /api/v1/search/external` mirrors the searchExternalContent MCP tool (query + optional kind filter; returns messages + documents with snippets + vendor URLs); `GET /api/v1/connectors` lists configured connectors with status + content + extraction counts (no tokens, no scope strings, no config payloads). `@kumokodo/kodori-sdk` 0.2.0 wraps both with typed methods: `kodori.externalSearch.run({ query, kind?, limit? })` and `kodori.connectors.list()`. Zero breaking changes from 0.1.1. OpenAPI 3.1 manifest at /api/openapi.json includes both endpoints.
searchExternalContent — FTS + pgvector + RRF across all six connectors (mirrors hybridSearch quality)
Typed MCP tool combining Postgres full-text search (`websearch_to_tsquery` + `ts_rank_cd` + `ts_headline` snippets, GIN-indexed via migration 0084) with pgvector cosine similarity (HNSW-indexed) via Reciprocal Rank Fusion (k=60) — same shape as hybridSearch over Kodori documents. Output carries `score` + `source` ('keyword'|'semantic'|'both') so the agent reasons about hit strength. Tenant-scoped, kind-filterable, paused / revoked connectors excluded automatically. Graceful fallback to FTS-only when `OPENAI_API_KEY` is unset.
Pillar 08
Workspace administration
A workspace is a single tenant. Onboarding is SSO; permissions are explicit; admins have the controls without needing a CLI.
Auth.js v5 with Google + Microsoft SSO
No passwords on Kodori — your work account is the only credential. Microsoft Entra ID provider re-enables once an Azure tenant is connected; Google ships today.
Five-tier role model
viewer / contributor / auditor / admin / owner. The first user in a fresh tenant becomes owner; everyone after starts as viewer. Owners and admins bypass per-document grants by role.
Invite teammates by email — multi-tenant onboarding
Owners and admins issue invites from /members; Kodori emails the recipient a branded invite link, and the same URL stays visible in the pending list as a manual fallback (with one-click Resend / Revoke). The invitee signs in with Google or Microsoft, accepts, and is moved into your workspace with the role you picked. Audit log records the move including the previous tenant id.
Bulk invite — paste up to 100 emails, one role, one click
Onboard a whole team without 20 individual invites. The "Invite many at once" form on /members takes a textarea (paste from a Sheet column, an Outlook To: field, or any whitespace / comma / semicolon separated list), one role for the batch, and a Send button. Kodori parses, validates email shape per token, dedupes against existing members + pending invites in one query, checks each email against the seat cap, and sends the ones that pass. The afterwards banner reports a count for each bucket: sent / already in workspace / invalid email / over seat cap / send-failed — partial success is communicated, not silently dropped. Each successful invite emits the same permission.granted event a single-row invite does, so the audit chain treats a bulk batch identically to N individual invites.
User groups for batched permission grants
Owner / admin creates groups at /groups (e.g. "litigation team", "outside counsel for Smith v Acme", "AP clerks"), adds tenant users as members, and grants the group read or write access on a document via the new "Share with group" affordance in the /doc/[id] Access panel. Every member of the group inherits the grant transitively via the canReadDocument resolver's group-membership branch — no per-user grants needed. Deny-wins still applies (user-level deny beats any group-level allow). Soft-deleted groups stop applying immediately. Schema has a nullable external_id column ready for future SCIM/SAML group sync. See D331.
Per-member role change, remove, leave, transfer ownership
Each row on /members has a role dropdown (owner / admin / contributor / viewer / auditor) and a Remove button visible to admins. Owners can also "Make owner" any non-owner row to hand off the keys. Members can leave on their own from /settings/account. Last-owner safety enforced everywhere: the only path to leave a workspace as the sole owner is to transfer ownership first.
Last-seen tracking on /members for offboarding + SOC 2 reviews
Each member row surfaces "last seen today", "last seen 3d ago", "last seen 2mo ago", or ISO date for older signs. "last seen never" for accounts that haven't signed in since the indicator was deployed. Owners + admins making offboarding decisions ("this admin hasn't logged in for 6 months — deactivate?") see the diagnostic at a glance instead of grep-ing the audit log. New last_sign_in_at column on users + composite index on (tenant_id, last_sign_in_at DESC NULLS LAST) for fast "who hasn't signed in in N days?" admin scans. Stamped on every Auth.js session refresh via the JIT user-upsert path.
Inactive-user filter on /members — one-click access review queue
Admin-only chips at the top of the members section toggle between "All N members" and "Inactive 90+ days (M)" with an adjustable threshold input (1-365 days, default 90). Inactive = no recorded sign-in OR last sign-in older than the threshold — covers both never-signed-in accounts AND accounts that have gone dark. Each inactive member row keeps the existing Remove + role-change + Make-owner affordances; no new offboarding flow needed. Closes the SOC 2 quarterly access review loop ("show me everyone who hasn't signed in this quarter") and post-engagement client cleanup. URL-as-state filter so admins can share the link.
Onboarding emails — welcome + drip schedule
New users get a "Welcome to Kodori" email on first sign-in (⌘K hint, drag-drop hint, sample-data prompt). Three follow-up tips spaced over the first two weeks via a daily Inngest cron: try the agent (day 3), load sample data (day 7), explore compliance (day 14). Per-user idempotency in `onboarding_email_log`; per-run cap of 50 sends per kind so a post-outage backlog can't burst Resend.
Pending-invites banner on the dashboard
A returning user who signed in directly (without clicking the invite email) sees a "You have a pending invite to <workspace>" banner with a one-click "Review & accept →" — no more digging up the original message.
Email diagnostics on /members
A collapsible diagnostic panel (admin-only) shows API-key status, from-address, invite-link base URL, and a one-click "Send test" form. Routes through the same Resend client as the real invite, so a green response proves the production path. Failure copy distinguishes "no API key" from "domain probably not verified" with a direct link to the Resend domains dashboard.
Cost dashboard
Per-tenant spend on Anthropic Claude, OpenAI embeddings, R2 storage, and PDF extraction — microcent resolution captures per-token vendor pricing without floating-point drift. 30-day total, spend by kind, 14-day trend, and the top 20 individual cost events for triage. /costs is owner / admin only. Page-loads stay sub-second at any tenant size: a daily Inngest cron (D281) pre-aggregates cost_events into a (tenant_id, day, kind) rollup table; /costs reads the rollup for completed days and raw events for today only, so aggregation cost is bounded regardless of underlying volume.
WorkOS SAML-only as escape hatch
When an enterprise customer contract requires SAML/SCIM, WorkOS plugs in. Pay only when needed (~$125/mo base + $0.125/MAU) versus tier-up packages that bundle features you don't need.
Per-tenant ingest slug
Each tenant gets a short, unguessable email-ingest slug used as the plus-tag on the tenant's docs+ address. Rotates with one DB update if it ever leaks.
Pillar 09
Trust, performance, transparency
No black boxes, no ML opacity, no "trust the platform" hand-waving. Every decision is observable; every action is reversible; every byte is somewhere you can audit.
Vercel + Neon + Cloudflare R2 stack
Next.js 15.5 on Vercel; Neon Postgres (serverless, with branches per preview); Cloudflare R2 for content-addressable blob storage. No vendor at the data tier locks you in — the schema is yours.
Inngest for durable workflows
Extraction, embedding, auto-classification — all run as Inngest functions with step-level durability and retry. Per-tenant concurrency keys (event.data.tenantId) on every per-document function mean one tenant's bulk import can't starve another's classifier or webhook delivery — the platform stays tenant-isolated under load. Every cron declares single-flight concurrency. Operator-side observability via /admin/cron-status (D278) + /admin/queue-depth (D280) — operators see "did the cron run?" + "is the work backing up?" without tailing logs. HIPAA tier available; Temporal pre-architected as the fallback if pricing ever shifts.
Mobile-responsive across the entire app
Wide tables wrap in horizontal scroll; page padding scales by breakpoint; the agent drawer goes full-width on phones. Tested on iOS Safari and Chrome Android.
Open-source MCP tool surface
The internal MCP server and the public MCP server share the same tool catalog. External integrators can call the same tools the agent does, with their own auth. The catalog is documented; the schemas are typed.
Vitest unit suite
26 tests across packages/workflow + apps/web cover the embedding chunker, the illustrator-ai magic-byte detector, the revertable predicate, and the API-key bearer parser. pnpm test runs the suite via Turbo.
Ready to try it on your own documents?
Sign in with Google, drop a folder, ask the agent a question. The whole flow takes about five minutes.