Bulk ingest via the REST API

POST raw bytes to /api/v1/documents with metadata in headers. Designed for watch folders, ETL jobs, scanners — anywhere browser presigned-PUT is overkill.

Updated 2026-04-25

For programmatic ingest — watch-folder daemons, ETL jobs, scanners, anything server-to-server — skip the multi-step presigned-PUT dance and POST the bytes directly:

POST /api/v1/documents Authorization: Bearer k_<prefix>_<secret> Content-Type: <mime type of bytes> X-Kodori-Display-Name: <user-facing name; required, max 512 chars> X-Kodori-Sensitivity: internal | confidential | restricted | regulated | public (optional; default internal) X-Kodori-Collection-Id: <uuid> (optional; pin to this collection on create) X-Kodori-Metadata: <JSON object> (optional; merged into document metadata)

<raw bytes as request body>

The endpoint hashes the body, dedups against your content-addressable store (an identical blob you uploaded last week pays zero new storage), creates the DocumentObject, and fires the same extraction + auto-classify pipeline a UI upload triggers. Returns 201 with documentId, versionHash, sizeBytes, mimeType, displayName, and a deduped boolean.

Example:

curl -X POST https://kodori.ai/api/v1/documents \ -H "Authorization: Bearer $KODORI_KEY" \ -H "Content-Type: application/pdf" \ -H "X-Kodori-Display-Name: 2024-Q3 BigCo Master Service Agreement.pdf" \ -H "X-Kodori-Sensitivity: confidential" \ -H "X-Kodori-Metadata: {\"keywords\":[\"BigCo\",\"MSA\"],\"docType\":\"contract\"}" \ --data-binary "@./msa.pdf"

Hard cap: 50 MB per request. Files larger than that need the multi-step presigned-PUT flow (browser uploads use this); the route returns 413 if you push past. Most accounting / legal / AEC documents fit comfortably.

Required scope: documents:write. The endpoint never bypasses the standard pipeline — same content-hash identity, same audit-event emission ("document.created"), same extraction + embedding + auto-classify cascade. Webhook subscribers see new uploads exactly as they would for UI ingest.