Getting Started
Expunct ships two API pillars. Pick the one you need first — you can come back for the other.
| Pillar | Best for | Status | Shortest first-success path |
|---|---|---|---|
| Redaction | Sanitizing text, files, prompts, or LLM I/O | GA | Python or Node SDK, or curl |
| Document Intelligence | Parsing, extracting, or safe-parsing PDF/DOCX for AI | Beta — gated per tenant | Raw HTTP (curl / httpx / fetch) |
Document Intelligence is in beta. Endpoints return
403until the feature flag is enabled for your tenant. If you signed up today, the redaction path will work immediately; document-intelligence requires beta enablement (Starter by request, with approved Professional and Business tenants as the primary rollout tier).
1. Get an API key
Create an API key from the dashboard or via the API:
curl -X POST https://api.expunct.ai/api/v1/api-keys \
-H "Content-Type: application/json" \
-d '{"name": "my-first-key"}'API keys use the format pk_live_... for production or pk_test_... for testing.
Path A — Redact text (works on every plan)
This is the fastest first success and works for every signed-up tenant.
Install an SDK
Python
pip install expunctRedact text
Python
from expunct import Expunct
client = Expunct(api_key="pk_live_...")
result = client.redact.text(
text="John Smith's email is john@example.com and SSN is 123-45-6789",
)
print(result.redacted_text)
# "[PERSON]'s email is [EMAIL_ADDRESS] and SSN is [US_SSN]"The response includes the redacted text and a list of findings — see Redaction for the full schema.
Path B — Document Intelligence (beta)
The truthful first-success path for document intelligence today is raw HTTP. Do not assume the published Python SDK, Node SDK, CLI, or MCP packages expose document-intelligence operations yet. Use
curl,httpx, orfetchuntil package support is explicitly published and documented. Status is tracked on the Document Intelligence page.
Document Intelligence has three operations on PDF and DOCX:
| Operation | Endpoint | What it returns | Use when |
|---|---|---|---|
| Parse | POST /api/v1/parse | Canonical structure + markdown + chunks | You need RAG-ready text and structure |
| Extract | POST /api/v1/extract | JSON matching your schema (or a built-in template_id) | You need specific fields (invoice totals, dates, names) |
| Safe-Parse | POST /api/v1/workflows/safe-parse | Sanitized canonical + markdown + chunks (no PII) | You need parse output that is safe to embed, store, or send to a third-party LLM |
safe_parse is parse + sanitize as one workflow — not a separate parser. Use it when the document is sensitive and you want only sanitized artifacts persisted.
First success — submit a safe-parse job
cURL
# 1. Submit
curl -X POST https://api.expunct.ai/api/v1/workflows/safe-parse \
-H "X-API-Key: pk_live_..." \
-F "file=@document.pdf" \
-F "language=en"
# Response: { "id": "7a8b...", "status": "queued", "workflow_kind": "safe_parse", ... }
# 2. Poll
curl https://api.expunct.ai/api/v1/documents/jobs/7a8b... \
-H "X-API-Key: pk_live_..."
# 3. Once status == "completed", read an artifact
curl https://api.expunct.ai/api/v1/documents/<artifact_id>/content \
-H "X-API-Key: pk_live_..."If the first call returns 403, your tenant does not yet have document_safe_parse_workflow enabled. Contact support to enable Document Intelligence beta for your account.
See the full reference: Parse, Extract, Safe Parse.
Next steps
- All users
- Workflows — file, batch, policy-based redaction recipes
- Entity Types — what the redaction engine detects
- API Reference — every endpoint
- Redaction
- Python SDK and Node.js SDK — published and ready
- LangChain integration — drop-in PII redaction middleware
- MCP server — redaction tools for Claude Code, Claude Desktop, and other MCP clients
- Document Intelligence (beta)