Configure Google Document AI as the OCR fallback

Kodori's extraction cascade is Azure Document Intelligence → Google Document AI → Claude vision → built-in text. Each tier self-reports unsupported until configured; the registry walks past unconfigured tiers cleanly.

This article walks through wiring a Google Document AI processor as the second-tier OCR. Useful when:

- You don't have Azure Document Intelligence provisioned (Azure remains primary if you ever do; DocAI keeps working as the documented fallback). - You want better OCR than Claude vision on dense scans, handwriting, or multi-page PDFs (DocAI is purpose-built; Claude vision is general-purpose). - You want lower per-page cost than the LLM path for large documents.

**One-time GCP setup**

1. Open <https://console.cloud.google.com/ai/document-ai>. 2. Create a project (or pick an existing one) — note the Project ID. 3. Click **Create Custom Processor** → choose the **Document OCR** processor type. (Layout Parser also works if you want positional output; Form Parser is for invoice/receipt workflows that we route through the auto-classify pipeline instead.) 4. Pick a region — `us` for general use, `eu` for EU-resident processing. 5. After creation, copy the **Processor ID** from the processor detail page. 6. Create a service account at IAM & Admin → Service Accounts → Create. Grant it the **Document AI API User** role. 7. Generate a JSON key for the service account. Download the key file.

**Wire it into Kodori**

Set these env vars on the Kodori deployment (Vercel project → Settings → Environment Variables, or your equivalent):

GOOGLE_DOCAI_PROJECT_ID = <gcp project id from step 2> GOOGLE_DOCAI_PROCESSOR_ID = <processor id from step 5> GOOGLE_DOCAI_LOCATION = us # or eu GOOGLE_APPLICATION_CREDENTIALS_JSON = {"type":"service_account","project_id":"…",…}

The last variable holds the **full contents** of the service account JSON file, on one line. On Vercel, paste the full JSON into the env-var value field; Vercel handles the multi-line storage. If you're running on Cloud Run / GCE / GKE, you can omit `GOOGLE_APPLICATION_CREDENTIALS_JSON` and let Application Default Credentials handle auth automatically.

**What happens after**

On the next document upload, Kodori's registry checks Azure first (skips if not configured), then DocAI. Once Kodori detects `GOOGLE_DOCAI_PROJECT_ID` + `GOOGLE_DOCAI_PROCESSOR_ID` in env, every PDF / PNG / JPEG / WebP / GIF / BMP / TIFF upload routes through DocAI. Office documents (.docx / .xlsx / .pptx) stay on the pure-JS adapters because they're free.

The doc-detail page shows which extractor processed each version under "Extraction status" — confirm `extractor: google-docai` after your first post-config upload.

**Caps + costs**

- **20 MB per document, sync API.** Google's documented limit. Larger documents fail with `unsupported` extraction status; the document is still ingested, you can re-extract once larger files are supported via the async batch API (planned, not yet shipped). - **~$0.0015 per page** for Document OCR processor (consult Google's current pricing). Cheaper than Claude vision per page; more expensive than the free Office / built-in adapters. - **Lazy SDK import.** Deployments that never invoke DocAI don't pay any cold-start cost — the gRPC dependency tree only loads on the first DocAI-routed extraction.

**Falling back**

If DocAI returns an error (rate limit, processor disabled, malformed input), the document's extraction status is set to `failed` with the error message. Re-running extraction from the dashboard's "Run extraction on N pending" button re-tries, optionally after switching to Claude vision by temporarily unsetting the DocAI env vars.

Configure Google Document AI as the OCR fallback

Related in Workspace administration

Managing members — roles, remove, leave, transfer

Onboarding emails — what arrives and when

Invite teammates