Security & Compliance · The technical substrate
Audit-defensible by construction.
Kodori is built on primitives that don’t need a certificate to verify — hash-chained audit, deny-wins ACL, SSO-only auth, content-addressable encrypted storage. The certifications are on the roadmap below. The substrate exists today and you can inspect it yourself.
Live today · Verifiable in the running product
Technical controls.
Tamper-evident audit chain
Every consequential mutation appends an event to a per-tenant log. Each event's `prev_hash` is the SHA-256 of the previous event in the same tenant's stream — a single hash anywhere in the chain detects modification of any prior event. The chain is the artifact: deposition counsel, peer reviewers, and 21 CFR Part 11 inspectors verify continuity end-to-end without re-running anything.
Permission-trimmed at the index
Read access checks run inside the SQL query, not as a post-filter. A user without read on a document never sees a search hit, a dashboard count, an extracted-text snippet, an API response, or an agent reply for that document. Deny rules always win over allow rules. The agent uses the exact same canReadDocument gate the UI uses — there is no agent-bypass.
Hold-deny-wins on destruction
A document bound to an active legal hold refuses to delete (tombstoneDocument throws), refuses to dispose under retention (the review-queue button disables itself), and refuses to lower its sensitivity tier (setDocumentSensitivity blocks the downgrade). Three independent enforcement points for a single invariant — UI affordance, MCP tool gate, and audit chain. Subjects stay on the hold record forever as audit evidence even after release.
SSO-only authentication
No password is ever set on Kodori. Sign-in is Google Workspace or Microsoft Entra ID — your existing identity provider is the only credential. This removes the largest account-takeover attack surface: there are no Kodori passwords to phish, leak, or rotate. WorkOS SAML-only is the escape hatch when an enterprise contract requires SAML/SCIM specifically.
Content-addressable storage with encryption at rest
Document bytes live in object storage keyed by their SHA-256 content hash — the hash IS the storage key. Cloudflare R2 (and S3 in BYO-bucket deployments) provide AES-256 encryption at rest. Identity through hash means identical bytes deduplicate automatically; modifying a document produces a new hash, not a silent overwrite. Application-layer envelope encryption (per-tenant DEK wrapped with your KMS-managed KEK) layers on top via /encryption when BYO-KMS is configured. Object Lock / WORM on R2 / S3 ships for legally-held blobs once the first customer's records-management retention class is wired — engaged via the existing legal-hold-deny-wins gate so a held doc is already byte-immutable from the application layer regardless of bucket-level lock.
Five-tier sensitivity model — with collection-driven escalation
Every document carries a sensitivity tier: public / internal / confidential / restricted / regulated. Tiers surface visually across every list (search, dashboard, collection detail, doc detail). Auto-classification proposes a tier on ingest; humans confirm. Tier changes emit `document.sensitivity-changed` events; lowering a tier on a held doc is forbidden by the deny-wins gate. Collections (matters / projects / drawers / cabinets) can carry a default sensitivity that escalates members on add — strictly higher tier wins; we never demote. A doc moved into a regulated matter becomes regulated; a doc already at restricted moved into a confidential matter stays restricted. Lowest-wins / strict-equality inheritance modes deliberately not shipped — silent demotion is a SOC 2 finding waiting to happen.
Encrypted in transit
TLS 1.2+ on every public endpoint (Vercel-managed). Cloudflare R2 traffic uses HTTPS exclusively. Inngest webhooks are signed with `INNGEST_SIGNING_KEY`. The Cloudflare Email Worker for inbound email signs every forwarded message with `EMAIL_INGEST_SECRET` (HMAC-SHA256) before the API route accepts it.
Tenant isolation
Every row in every Postgres table carries a `tenant_id`. Every query is scoped to the caller's tenant; cross-tenant reads are impossible by construction. The hash-chained audit log is per-tenant — chain integrity in tenant A does not depend on tenant B. API keys and webhook subscriptions both bind to the tenant of the issuing user.
Outbound webhooks signed with HMAC-SHA256
Every webhook delivery carries `X-Kodori-Signature: sha256=<hex>` computed over `<X-Kodori-Timestamp>.<body>` with a per-subscription signing key. Receivers verify by recomputing the HMAC and rejecting timestamp drift > 5 minutes — that's the replay-protection window. Plaintext signing keys are shown exactly once at creation; sha256 of the full key is stored at rest. Distinct from API-key shape (`whsec_…` vs `k_…`) so a leak of one can't be reused as the other.
API keys with opt-in scopes
Every key carries `search:read` as the always-granted baseline. Write scopes (`documents:write`, `documents:delete`, `collections:write`) are opt-in at creation via checkboxes on /api-keys. A leaked search-only key has a smaller blast radius than a leaked all-writes key, and Kodori makes that distinction explicit. Plaintext key shown once at creation; sha256 stored at rest with constant-time compare on verify.
DLP scanning on every upload
Pattern + checksum detectors fire over the extracted text of every uploaded document — US SSNs, Luhn-validated credit cards, ABA-validated routing numbers, MRN-prefixed identifiers, AWS access keys, GitHub tokens, PEM private-key blocks, JWTs, and key=value secrets. High-confidence findings auto-escalate sensitivity to "regulated" the moment the scan lands; the document never sits at a lower tier between ingest and human review. The matched value is never stored — only a pre-redacted preview.
Anomaly detection with agent step-up
A 15-minute cron scans the audit log for behavioural anomalies: one principal reading high volumes of regulated docs, touching all five sensitivity tiers in a window, spiking off-hours against their own 7-day baseline, repeatedly touching held documents, or running an agent loop past a tool-call ceiling. High-severity AGENT signals automatically write a deny rule on /permissions — the offending agent is paused until an owner / admin lifts it with a written rationale. Every state transition is captured on the hash-chained audit log.
Bring-your-own KMS key (envelope encryption + lifecycle audit)
Every blob is envelope-encrypted with a fresh AES-256-GCM Data Encryption Key wrapped against your tenant's Key Encryption Key. Register a customer-managed KEK in your AWS / Azure / GCP account from /encryption — revoking it instantly makes every blob unreadable, regardless of where the encrypted bytes physically reside. Rotation is additive: registering a new key auto-retires the prior; existing wrapped DEKs continue to unwrap. Every change to your key custody is hash-chain audited via tenant-kms.registered / tenant-kms.rotated / tenant-kms.disabled events with from/to keyIdSuffix in the rotated payload — auditors get "every change to key custody in YYYY" with one /audit query. Default tenants get envelope encryption via a deployment-managed key until they upgrade.
Audit log CSV export with date + actor filters
Narrow the audit log by date range, event type, or actor (substring across user emails, agent IDs, system principals), then export the filtered set to RFC 4180 CSV for legal review or external auditor handoff. URL state preserved so a specific filter combo is shareable.
External connector security — OAuth tokens encrypted at rest
Six vendor kinds — Slack, Gmail, Outlook, SharePoint, OneDrive, Google Drive — connect via OAuth from /integrations. Access + refresh tokens persist application-layer encrypted via AES-256-GCM with a scrypt-derived key from `AUTH_SECRET`; envelope-extends to your tenant's KMS-managed Key Encryption Key when BYO-KMS is configured. Tokens are NEVER returned by the public API or admin UI — the /api/v1/connectors listing surfaces lifecycle status + content counts but explicitly omits tokens, scope strings, and config payloads. Connector content is tenant-scoped: searchExternalContent and unifiedSearch only return rows from authorized connectors (status="connected"). Pause / Resume / Revoke per-row, or sweep all-paused/all-connected via bulk admin actions. Every lifecycle change emits a hash-chained audit event.
GDPR Article 17 right-to-be-forgotten — connector content purge
Customers asking for actual deletion (not just disconnect) of synced connector data have a clean 2-click + 1-type flow on /integrations/[id]: revoke the connector to acknowledge disconnect intent, then type the connector display name exactly to confirm deletion intent. The action permanently removes every external_messages + external_documents row for the connector and lands an `external-connector.content-purged` audit event with messagesPurged + documentsPurged counts — the compliance evidence preserves WHO purged WHAT WHEN without preserving the deleted content (which would defeat the deletion intent). Same typed-name pattern GitHub uses for repo deletion + Postgres uses for dump-restore.
Connector text extraction — bytes never mirrored as Kodori documents
When a SharePoint / OneDrive / Google Drive file syncs in, Kodori downloads the bytes via the vendor's authenticated API + extracts plain text via the same cascade Kodori-native uploads use (Azure Doc Intel, Office adapters, Whisper, Claude PDF, builtin-text). The extracted text + a vendor URL pointer + a pgvector embedding land in `external_documents` — but the BLOB ITSELF is never stored. The vendor stays the source of truth. Operators deleting a file in SharePoint see it disappear from Kodori on the next sync (no copy survives). This is a deliberate posture choice over the easier "mirror as Kodori doc" alternative — keeps the audit chain clean and avoids the consistency problem of "vendor delete vs Kodori soft-delete." 50MB byte cap matches the Kodori upload limit; 2MB stored-text cap (≈500 pages dense prose) bounds storage.
Internal security policy set + responsible-disclosure intake
Ten internal security policies (Information Security, Acceptable Use, Access Control, Change Management, Data Classification, Encryption, Incident Response, Vendor Management, Backup & Disaster Recovery, Risk Assessment) — the documents a SOC 2 Type I auditor pulls when the engagement opens. Markdown source is in the public repo; auditor-grade PDF set with version history available under NDA. Companion responsible-disclosure intake at /security/responsible-disclosure with explicit in-scope / out-of-scope catalog, 1-business-day acknowledgment SLA, safe-harbor terms.
Cedar policy engine — shadow-mode with divergence observation
Customer-defined ABAC policies in Cedar DSL evaluate against every consequential mutation in parallel with the existing TS authorization gates. Today the TS gates remain authoritative; Cedar runs in observation mode. An hourly Inngest cron (`cedar-divergence-observation`) pulls write-side audit events from the last hour, replays them through `@cedar-policy/cedar-authorization`, and emits a `policy-engine.divergence` audit event when Cedar disagrees with the TS gate the action already passed. After 30+ days of zero divergences across a tenant's active policies, the per-tenant authoritative flag flips Cedar to source-of-truth. Real SDK wiring (lazy server-only wasm load, per-tenant engine cache, default Kodori v1 schema bundled) means a customer ABAC contract is a 1-day flip — not a 2-week integration project.
Conversation export — agent transcripts as compliance evidence
AI conversation transcripts can be exported as plain Markdown or text via either the agent drawer "Download" button (UI server action) or `GET /api/agent/conversations/[id]/export?format=md|txt` (curl-able for archival scripts — nightly cron → S3 archive pattern). Per-user permission gate (only the conversation OWNER exports — tenant admins still see every agent action via the existing per-tool audit events). Both paths emit `agent-conversation.exported` on the hash-chained audit log so "who exported what when" answers compliance questions like SAR / discovery / matter-file evidentiary trails.
Compliance roadmap · Honest dates, no marketing fiction
Certifications.
We don’t pretend to hold certifications we don’t. Status meanings: Live today is verifiable against the running product right now; Customer-anchored means the substrate is ready and the audit + paperwork kick off when a customer contract anchors the engagement; Audit pending means the substrate is ready and we’re preparing to engage an auditor (gated on revenue / funding); Sequenced means dependent on a prior certification completing first. If a specific certification is a hard requirement for your firm today, an incumbent (iManage, NetDocuments, MasterControl) wins that comparison — and we say so on the comparison pages.
| Control | Status | Notes |
|---|---|---|
| Hash-chained audit log | Live today | Per-tenant SHA-256 chain. Tamper-evidence at the chain level, not just the row level. |
| Permission-trimmed retrieval | Live today | Deny-wins ACL enforced at the index across UI, search, agent, and public API. |
| TLS in transit + AES-256 at rest | Live today | Vercel + Cloudflare R2 defaults. BYO-bucket support at the top tier. |
| SSO via Google + Microsoft (Auth.js v5) | Live today | No passwords on Kodori. WorkOS SAML/SCIM as escape hatch for SAML-only contracts. |
| API keys with opt-in scopes | Live today | search:read baseline; documents:write, documents:delete, collections:write opt-in at creation. Plaintext shown once; sha256 stored at rest. |
| HMAC-signed outbound webhooks | Live today | X-Kodori-Signature: sha256=<hex> over <timestamp>.<body>. Replay-protection via 5-minute timestamp drift check. |
| GDPR / UK-GDPR / CCPA technical compliance | Live today | Article 15-22 rights mapped to endpoints (data export, rectification, tombstone + connector purge for erasure, portability). Section-by-section conformance claim at /legal/gdpr. Executable DPA + EU SCCs + UK Addendum available on request. |
| HIPAA technical safeguards + BAA | Customer-anchored | Substrate ready today: BAA-eligible sub-processor chain (Neon, Inngest, Cloudflare, Anthropic, Vercel) is contiguous from Day 1. BAA template, technical-safeguards review, and HIPAA-expert audit kick off when a healthcare design-partner contract anchors the engagement. |
| SOC 2 Type I | Audit pending | Substrate is audit-ready today. Auditor engagement (~$15-30K + compliance-automation tooling) + 3-4 month evidence-collection window kick off on revenue trigger or funding event. |
| Office add-ins (Outlook / Word / Excel / PowerPoint) | Live today | All four task panes shipped — sideload via /office. One Kodori API key signs into all four. Outlook files email + attachments from the ribbon; Word / Excel / PowerPoint save the active document as new doc OR new version with optional version label. |
| External connectors (Slack / M365 / Google Workspace) | Live today | Six vendors live E2E: Slack, Gmail, Outlook, SharePoint, OneDrive, Google Drive. OAuth tokens encrypted at rest (AES-256-GCM, scrypt-derived key from AUTH_SECRET; envelope-extends to BYO-KMS). Tenant-scoped retrieval, lifecycle audit events, recurring sync cron + per-tenant cadence override. |
| GDPR Article 17 right-to-be-forgotten (connector content) | Live today | Typed-confirmation gate on /integrations/[id] permanently removes every synced message + document for a revoked connector. Audit chain preserves the purge intent + counts without preserving the content. SAR-friendly compliance trail. |
| BYO-KMS tenant-key re-wrap orchestration | Live today | Inngest re-wrap function with concurrency.key=tenantId; lifecycle events (rewrap-requested / -progress / -completed / -orphaned / -acknowledged) on the hash-chained audit log. Per-vendor KMS SDK integration (AWS KMS / Azure Key Vault / GCP KMS) drops in cleanly when a customer BYO-KMS engagement anchors the choice. |
| Cedar policy engine — shadow-mode with divergence observation | Live today | Real @cedar-policy/cedar-authorization SDK wired (lazy server-only wasm load, per-tenant engine cache, default Kodori v1 schema). Hourly cedar-divergence-observation Inngest cron replays write-side audit events through Cedar and emits policy-engine.divergence events when Cedar disagrees with the TS gate. Customer-side ABAC contract is a 1-day flip-to-authoritative after 30+ days of zero divergences. |
| SOC 2 Type II | Sequenced | Sequenced after Type I — 12-month observation window, then Type II report. Cannot be parallelized. |
| 21 CFR Part 11 conformance claim | Live today | Section-by-section conformance claim at /legal/21-cfr-part-11 covering Subpart B (§§11.10, 11.50, 11.70) and Subpart C (§§11.100, 11.200, 11.300). Hash-chained audit chain + SSO-anchored signatures + reversible-transaction model are the load-bearing controls. |
| EU AI Act Article 11 / 12 / 14 / 50 documentation | Live today | AI disclosure document at /legal/ai-disclosure — Article 11 technical documentation (model providers, capability scope), Article 12 logging (audit-chain capture of every agent action), Article 14 human oversight (consequential-action confirmation gate, reversibility, tool-call ceiling), Article 50 transparency (AI labeling). |
| ISO 27001 audit | Customer-anchored | ISMS implementation (~6 months) + audit (~$20-40K) kick off when an EU enterprise prospect anchors the timeline. Lower-priority than SOC 2 unless contractually required. |
| SEC 17a-4 audit-trail-alternative posture | Live today | Conformance claim at /legal/sec-17a-4 — section-by-section mapping to the November 2022 amendment (§§17a-4(f)(2)(i)-(iii) + 17a-4(f)(3)(i)(A)-(E)). Customer-firm representation letter + technical-architecture document available on request to support FINRA filings. |
Common questions.
- Is Kodori SOC 2 certified today?
- Not today. The substrate IS audit-ready — hash-chained tamper-evident chain, deny-wins ACL, SSO-only auth, encryption at rest + in transit, tenant isolation. What's missing is the auditor engagement itself ($15-30K + ~$8-15K/yr in compliance-automation tooling like Drata or Vanta + 3-4 month evidence-collection window). Status today: SOC 2 Type I is `audit-pending` — the engagement starts on a revenue trigger or funding event. Type II is sequenced after Type I (12-month observation period). We're honest about this on every comparison page — incumbents like iManage and NetDocuments win today on current certification posture. The audit catches up to the substrate, not the other way around.
- Will you sign a BAA?
- HIPAA + BAA is `customer-anchored` today — the substrate is ready (hash-chained audit + AES-256 encryption + tenant isolation + permission-trimmed retrieval) and the BAA chain is contiguous from Day 1 because every sub-processor on the data path is HIPAA-eligible (Neon, Inngest, Cloudflare R2, Anthropic, Vercel). What kicks off when a healthcare design-partner contract anchors the engagement: BAA template legal review, technical-safeguards review by a HIPAA expert, sub-processor BAA execution. Estimated 3-6 weeks once the design partner is identified. Reach out via /about if you're a healthcare prospect — design-partner BAA conversations are open.
- How is data encrypted?
- In transit: TLS 1.2+ on every public endpoint. At rest: AES-256 via Cloudflare R2 (or S3 in BYO-bucket deployments) for blob storage; Neon Postgres encrypts at rest. Application-level encryption of sensitive fields via per-tenant envelope encryption (AES-256-GCM Data Encryption Keys wrapped against your tenant's Key Encryption Key) ships today via the BYO-KMS lifecycle at /encryption — register a customer-managed KEK in your AWS / Azure / GCP account and revoking it instantly makes every blob unreadable. Per-vendor KMS SDK wiring (AWS KMS / Azure Key Vault / GCP KMS) drops in cleanly when a customer BYO-KMS engagement anchors the specific cloud — until then the orchestration runs against the deployment-managed envelope key.
- Can we BYO encryption keys?
- BYO-key via KMS — orchestration shipped today. Register a customer-managed KEK in your AWS / Azure / GCP account from /encryption; revoking it instantly makes every blob unreadable. Rotation is additive: registering a new key auto-retires the prior; existing wrapped DEKs continue to unwrap. Re-wrap of every DEK against a new active key runs as an Inngest function with concurrency.key=tenantId and emits requested/progress/completed/orphaned audit events. Per-vendor KMS SDK wiring (AWS KMS / Azure Key Vault / GCP KMS) drops in cleanly when a customer BYO-KMS engagement anchors the specific cloud — until then the orchestration runs against the deployment-managed envelope key.
- How are external connectors (Slack / M365 / Google) secured?
- OAuth tokens encrypted application-layer at rest (AES-256-GCM with a scrypt-derived key from AUTH_SECRET; envelope-extends to your tenant's KMS-managed Key Encryption Key when BYO-KMS is configured). Tokens are NEVER returned by the public API or admin UI — the /api/v1/connectors listing surfaces lifecycle status + content counts but explicitly omits tokens, scope strings, and config payloads. Connector content is tenant-scoped (only authorized connectors contribute to retrieval). Per-vendor refresh-token handling: Microsoft Graph rotates refresh tokens; Google forces consent on every re-OAuth so revoked-then-reconnected accounts re-issue tokens cleanly. Slack uses bot tokens scoped to channels you invite the bot to — operator-controlled scope expansion. Vendor stays source-of-truth: file bytes are NEVER mirrored as Kodori documents — only extracted text + URL pointers + embeddings persist for retrieval. Operators wanting actual deletion of synced data use the typed-confirmation purge flow on /integrations/[id].
- What does a deletion actually do?
- `tombstoneDocument` flips the document's status to `tombstoned` and emits a `document.tombstoned` event with the actor's reason. The bytes stay in object storage and the audit trail stays intact during the retention window. Hard purge runs at retention-class expiry behind a daily Inngest cron with a longer grace period — review-disposition documents land at /retention/review for human confirmation; auto-tombstone-disposition documents flow through the cron without human gate. Held documents refuse all destructive operations regardless of retention status — hold-deny-wins.
- How does Kodori isolate one tenant from another?
- Every row in every Postgres table carries `tenant_id`. Every query is scoped — there's no admin-bypass for cross-tenant reads, no shared cache, no shared search index. The hash-chained audit log is per-tenant, so chain integrity in tenant A does not depend on tenant B. API keys bind to the tenant of the user who minted them. Permission-trimming is the same SQL gate everywhere it matters.
- What can the AI agent see and do?
- Exactly what the user it's acting for can see and do. The agent uses the same MCP tool catalog the UI uses, with the same `canReadDocument` gate. There is no privileged "agent role". Every agent action emits an event with `actorKind: agent`, so you can answer "what did the AI do this week?" with a single audit-log filter. For consequential actions (delete, restore, sensitivity change, retention change, hold change, bulk over 10) the agent will not invent a reason — it asks the user before calling the tool.
- Where do I report a security issue?
- Email security@kumokodo.ai with reproduction details. We respond within 1 business day and follow a coordinated-disclosure timeline. A formal bug-bounty program is gated on the same revenue / funding trigger as the SOC 2 Type I auditor engagement — until then we honor responsible-disclosure reports manually with named acknowledgment in /security release notes (no monetary payout pre-program).
Have a security question or a finding?
Email us at security@kumokodo.ai. We respond within one business day and follow a coordinated-disclosure timeline. Bug-bounty program lands alongside SOC 2 Type I.