Email + chat attachments — Outlook, Gmail, Slack files

Connectors today ship in two shapes:

- **File-storage vendors** (SharePoint, OneDrive, Google Drive) — every synced item is a document - **Message vendors** (Slack, Outlook, Gmail) — message body indexes into external_messages, file attachments land in external_documents

After v0.7.36 (Slack files) + v0.7.37 (Outlook + Gmail attachments), all six connector kinds surface attached file content uniformly. A PDF contract emailed via Outlook, a board deck attached to a Gmail thread, or a .docx shared in a Slack channel all flow into the same extraction pipeline as a direct SharePoint upload — and surface in `searchExternalContent` on body content, not just filename.

## What's collected

- **Slack files** — every file uploaded into a channel the bot is invited to. `files.list` is workspace-wide so coverage is whatever channels you've invited @Kodori to. - **Outlook attachments** — every `fileAttachment` on every message in your Inbox. Inline images (signature scans, embedded screenshots) are skipped — those are already in the message body. itemAttachment (forwarded emails) and referenceAttachment (OneDrive links) are skipped to avoid dedup with their actual sources. - **Gmail attachments** — every part of every message with a `body.attachmentId` set + a discoverable filename (either `payload.filename` or via Content-Disposition header).

## How extraction works

Each new attachment row in `external_documents` fires an `external-document/extract.requested` event. The extractor cascade picks the right adapter based on mime type — Azure Doc Intel for PDFs, Office adapters for .docx/.xlsx/.pptx, Whisper for audio, Claude vision as the last LLM-driven fallback. The extracted text lands on `external_documents.text` + becomes searchable via the FTS + pgvector hybrid retrieval (D226).

## Caps and back-pressure

- **50MB byte cap** per attachment (matches Kodori upload limit; oversize attachments fail with `oversize_<bytes>_bytes` and visible in extraction_error) - **2MB stored-text cap** truncates after extraction (≈500 pages dense prose) - **Outlook fan-out cap of 50 messages-with-attachments per sync run** — subsequent ticks pick up the rest naturally - **Gmail attachments** are bounded by the per-run message cap (no separate attachment cap; every message's attachments process inline)

## What's NOT collected

- Slack DM file uploads (`im.files`) — requires `im:read` + `im:history` scopes; deferred until customer signal - Outlook itemAttachment recursion (forwarded-email file attachments) — same reason - Gmail attachments in labels other than INBOX — currently only the Inbox label syncs - Inline images / signature graphics — intentional skip

## Retry behavior

Failed attachment extractions stick to the row with `extraction_error` set. There are two recovery paths:

- **Manual: "Retry failed" button on /integrations/[id]** (admin / owner only). Appears next to the failed-count pill on the extraction status panel when failures > 0. Click to fan out 500 retries at once; if more remain, click again. Returns within seconds. Use this after a known outage. - **Automatic: 6-hour retry-cron (D231)** sweeps stale failures across every connected connector and re-fires extract events. Permanent failures (oversize, unsupported mime, vendor 404) get one cheap retry every 6h before stopping; transient outages (Anthropic 503, Azure rate-limit storm) recover within 1-4 retry shots without operator action.

Email + chat attachments — Outlook, Gmail, Slack files

Related in Integrations and API

Save Excel workbooks to Kodori from the ribbon

Save PowerPoint decks to Kodori from the ribbon

Save Word documents to Kodori from the ribbon