Quick reference for every domain term in Kodori. Hover or click the `?` icons throughout the app for the same definitions in context.
## Documents and organization
- **Document** — a single file that lives in Kodori. Every upload becomes one document with a content hash, a sensitivity label, optional retention class, and (after extraction) a searchable text projection. - **Collection** — a virtual grouping of documents. Kodori has no physical folder tree (D2 — collections-as-views); every grouping is a queryable view over metadata. Six kinds: cabinet, drawer, folder, matter, project, custom. - **Cabinet** — top-level org structure. Cannot be nested. Used for "Acme Corp" or "Practice Areas" or "Operations." - **Drawer** — sub-section within a cabinet. Used for "Contracts", "Employment", "Litigation" inside a cabinet. - **Matter** — legal-style engagement grouping. Conflict-of-interest checks fire at create-time. Typically top-level but can nest under a cabinet. - **Project** — AEC- or initiative-style scope. Same shape as a matter; different semantic. - **Folder** — generic kind. Can nest under anything. Default when nothing else fits. - **Custom** — opaque kind for tenant-specific taxonomies. - **Pin** — adding a document to a collection's curated members. - **Smart collection rule** — a saved query that auto-populates a collection's members. The membership recomputes as new documents land or change metadata.
## Permissions and access
- **ACL** — access control list. Kodori's permissions model is deny-wins: a single deny rule overrides any number of allows. - **Permission-trimmed** — every search, list, and read operation filters at the index, not after. A viewer never sees a hit teaser for a record they can't open. - **Tenant** — a single workspace. Most customers have one tenant; firms with separate practice groups sometimes run multiple. Users belong to one tenant at a time. - **Owner / Admin / Contributor / Viewer / Auditor** — the five roles. Owner is the billing contact + can delete the tenant; admin manages members + permissions + retention; contributor uploads + edits; viewer reads; auditor reads everything (including the audit log) but cannot mutate.
## Sensitivity and governance
- **Sensitivity label** — public / internal / confidential / restricted / regulated. Drives DLP scanning, share-link gates, and retention-class compatibility. - **Retention class** — a named policy with a retain-for duration + disposal mode (review or auto-tombstone). Documents inherit the class; admins propose changes. - **Legal hold** — a deny-wins gate that prevents tombstoning, modification, or sensitivity-downgrade for a defined subject set. Outranks every other policy. Released only by an admin with a stated reason. - **Tombstone** — soft-delete. Documents move to /trash and stay readable for the retention window before bytes are purged. Restorable until the disposal cron runs. - **DLP scan** — a Haiku pass that surfaces SSNs, credit cards, account numbers, etc. that may have been added to the document. Admins decide-confirm or dismiss-false-positive on each finding. - **Anomaly** — an unusual access or mutation pattern (high-volume regulated reads, agent burst, off-hours mass downloads). Surfaces on /anomalies for admin review.
## Extraction and content
- **Extraction** — the pipeline that reads text out of an upload so search and the agent can use the contents. Cascade: Azure Document Intelligence → Office adapters → Whisper (audio) → Google DocAI → raster-convert → Claude vision → built-in text. - **Status: succeeded** — extraction completed; text is in the search index. - **Status: pending / running** — extraction is queued or in flight. Real extractions finish in seconds; > 5 minutes counts as "stuck" and is eligible for re-run. - **Status: failed** — the extractor matched the MIME but raised an error. Click into a doc on /extraction-issues to read the specific message. - **Status: unsupported** — no extractor handles this MIME. Common causes: .one (OneNote), .tar.gz / .7z / .rar archives, raw camera files (.NEF / .CR2), CAD (.dwg), encrypted PDFs, video. See /help/how-extraction-works for the full list. - **Status: dismissed (D327)** — operator-acknowledged "won't fix." Excluded from "Re-run for all" so a known-bad extraction doesn't burn API cost on every bulk run. Set via the per-row "Won't fix" button on /extraction-issues; cleared via "Re-enable" on the Dismissed tab. - **Re-run for all** — the dashboard button that re-queues every doc in a re-runnable state (failed + unsupported + stuck + never-queued). Skips dismissed rows. Re-running an unsupported doc is FREE (short-circuits before any API call); re-running a failed doc CAN cost money (extractor re-invokes the LLM). Durable: handles tenants of any size via a chunked Inngest workflow. - **Group (D331)** — a named, tenant-scoped bucket of users. Owner / admin creates groups at /groups and adds members. Granting a group read access on a document propagates the grant to every member transitively. Common shapes: "litigation team", "outside counsel for Smith v Acme", "AP clerks". Deny-wins still applies — a user-level deny beats any group-level allow. Groups don't carry their own role (a user's tenant role is independent); they're purely permission-grant batches today.
## Search and the agent
- **Hybrid search** — Postgres full-text and pgvector embeddings run in parallel, fused via Reciprocal Rank Fusion. Each hit shows whether it came from keyword, semantic, or both. - **Bates** — sequential numbering applied to documents in legal productions (e.g. "ABC0001" through "ABC0500"). Kodori indexes Bates ranges so /search?bates=ABC0123 finds the right document. - **Saved search** — a query you've named for re-use. Surfaces in the sidebar with a "new since last opened" badge. - **Agent** — natural-language assistant with access to the same tools as the public MCP server. Permission-trimmed: the agent can never surface a doc the asker cannot read. - **MCP tool** — a typed action (`createDocument`, `grantPermission`, `setSensitivity`, etc.) the agent can invoke. Same catalog the public MCP server exposes — every external integration uses these.
## Quotas, caps, and cost
- **Cap warning** — a banner shown when you're at ≥80% of any plan limit (questions, extractions, storage, seats, AI budget). Drives upgrade conversations. - **Quota** — a per-tenant ceiling on a specific action (e.g. "200 questions/seat/month" on Team). Refusal surfaces inline; the upgrade CTA is always one click away. - **AI budget / cost cap** — the dollar-cost ceiling on paid-API calls (Anthropic, OpenAI). Customers see only utilization%; owners see dollars on /admin/cost. - **BYO key** — the Business+ escape hatch where the tenant pastes their own Anthropic API key. Kodori bills against their account; the cost cap becomes a no-op. - **Override** — per-tenant lift of the plan default cost cap. Encodes Enterprise contractual ceilings + early-access bumps without changing tier.
## Audit and trust
- **Event** — an immutable record of a consequential mutation. Every event chains via SHA-256 prev_hash to the previous one — tampering breaks the chain. - **Audit log** — the surfaced view of every event in the tenant. Filterable by stream, actor, type, time. The same data the SOC 2 auditor reads. - **Stream** — the per-entity event sequence (e.g. `document/<id>`). Operations against an entity append to its stream in order. - **Audit-chain verification** — periodic walk of the prev_hash links to prove the chain is intact. Fires weekly + on-demand from /audit.
## Operational
- **Tenant Inngest sync** — the deployment ↔ Inngest function-set link. Stale syncs cause silent extraction death; check first when whole-fleet extractions fail. - **/extraction-issues** — drill-in for the dashboard's stuck/failed/unsupported counts. Three filter tabs, permission-trimmed. - **/admin/cost** — owner-facing dollar dashboard. The symmetric surface to the customer-facing utilization%. - **/audit** — full audit log with filters + chain-verification controls.
The `?` icon next to most labels in the app surfaces these definitions in context. If you spot a term that lacks a tooltip, the convention is: hover the term in this glossary to confirm the definition, then ask us to add the tooltip — we'd rather over-document than leave a new user staring at jargon.