Workflows
Expunct supports several workflow patterns depending on your use case. This page covers the five most common patterns with code examples.
Text redaction
The simplest workflow. Send text, get redacted text back synchronously. Best for inline or real-time redaction.
Python
from pii_redactor_sdk import PiiRedactor
client = PiiRedactor(api_key="pk_live_...")
result = client.redact.text(
text="Contact Jane Doe at jane.doe@acme.com or 555-0123",
)
print(result.redacted_text)
# "Contact [PERSON] at [EMAIL_ADDRESS] or [PHONE_NUMBER]"
for finding in result.findings:
print(f"{finding.entity_type}: {finding.text} (score: {finding.score})")File redaction
For documents, images, video, and audio. File redaction is asynchronous — you submit a file, then poll for the result.
Supported file types
| Format | Extensions | Typical completion time |
|---|---|---|
| Documents | PDF, DOCX | 5-30 seconds |
| Images | PNG, JPG | 3-15 seconds |
| Video | MP4 | 30-300 seconds |
| Audio | WAV, MP3 | 15-120 seconds |
Submit and poll
Python
import time
from pii_redactor_sdk import PiiRedactor
client = PiiRedactor(api_key="pk_live_...")
# Submit a file
job = client.redact.file(file_path="/path/to/document.pdf")
print(f"Job submitted: {job.job_id}")
# Poll for completion
while True:
status = client.jobs.get(job.job_id)
print(f"Status: {status.status}")
if status.status == "completed":
print(f"Redacted file: {status.output_uri}")
break
elif status.status in ("failed", "error"):
print(f"Job failed: {status.error}")
break
time.sleep(2)Batch redaction
Process multiple files at once using cloud URIs. A batch can contain between 1 and 100 URIs. Each URI is processed as an individual job.
Python
from pii_redactor_sdk import PiiRedactor
client = PiiRedactor(api_key="pk_live_...")
batch = client.batch.create(
uris=[
"s3://my-bucket/reports/q1-report.pdf",
"s3://my-bucket/reports/q2-report.pdf",
"s3://my-bucket/recordings/meeting-2024-03.mp4",
],
output_prefix="s3://my-bucket/redacted/",
)
print(f"Batch ID: {batch.batch_id}")
print(f"Jobs created: {batch.total_jobs}")
# Check batch progress
status = client.batch.get(batch.batch_id)
print(f"Completed: {status.completed_jobs}/{status.total_jobs}")Policy-based redaction
Policies let you save reusable redaction configurations. A policy defines which entity types to detect and what action to take for each one (redact, mask, pseudonymize, or allow).
Create a policy
Python
from pii_redactor_sdk import PiiRedactor
client = PiiRedactor(api_key="pk_live_...")
policy = client.policies.create(
name="customer-support",
description="Redact PII in customer support transcripts",
entity_actions={
"PERSON": "pseudonymize",
"EMAIL_ADDRESS": "redact",
"PHONE_NUMBER": "mask",
"CREDIT_CARD": "redact",
"LOCATION": "allow",
},
)
print(f"Policy ID: {policy.policy_id}")Use a policy for redaction
Python
result = client.redact.text(
text="Jane Doe called from 555-0123 about order #789",
policy_id=policy.policy_id,
)
print(result.redacted_text)
# "Alice Johnson called from ***-**** about order #789"Multi-language redaction
Expunct supports detection in multiple languages. Specify the language parameter to optimize detection for a particular language.
Currently supported languages:
en— English (default)es— Spanish
Python
from pii_redactor_sdk import PiiRedactor
client = PiiRedactor(api_key="pk_live_...")
# Spanish text
result = client.redact.text(
text="El paciente Juan Garcia, DNI 12345678A, vive en Madrid",
language="es",
)
print(result.redacted_text)
# "El paciente [PERSON], DNI [US_SSN], vive en [LOCATION]"The language parameter can also be used with file and batch redaction. If omitted, the default language is English.