Skip to Content
Workflows

Workflows

Expunct supports several workflow patterns depending on your use case. This page covers the five most common patterns with code examples.

Text redaction

The simplest workflow. Send text, get redacted text back synchronously. Best for inline or real-time redaction.

from pii_redactor_sdk import PiiRedactor client = PiiRedactor(api_key="pk_live_...") result = client.redact.text( text="Contact Jane Doe at jane.doe@acme.com or 555-0123", ) print(result.redacted_text) # "Contact [PERSON] at [EMAIL_ADDRESS] or [PHONE_NUMBER]" for finding in result.findings: print(f"{finding.entity_type}: {finding.text} (score: {finding.score})")

File redaction

For documents, images, video, and audio. File redaction is asynchronous — you submit a file, then poll for the result.

Supported file types

FormatExtensionsTypical completion time
DocumentsPDF, DOCX5-30 seconds
ImagesPNG, JPG3-15 seconds
VideoMP430-300 seconds
AudioWAV, MP315-120 seconds

Submit and poll

import time from pii_redactor_sdk import PiiRedactor client = PiiRedactor(api_key="pk_live_...") # Submit a file job = client.redact.file(file_path="/path/to/document.pdf") print(f"Job submitted: {job.job_id}") # Poll for completion while True: status = client.jobs.get(job.job_id) print(f"Status: {status.status}") if status.status == "completed": print(f"Redacted file: {status.output_uri}") break elif status.status in ("failed", "error"): print(f"Job failed: {status.error}") break time.sleep(2)

Batch redaction

Process multiple files at once using cloud URIs. A batch can contain between 1 and 100 URIs. Each URI is processed as an individual job.

from pii_redactor_sdk import PiiRedactor client = PiiRedactor(api_key="pk_live_...") batch = client.batch.create( uris=[ "s3://my-bucket/reports/q1-report.pdf", "s3://my-bucket/reports/q2-report.pdf", "s3://my-bucket/recordings/meeting-2024-03.mp4", ], output_prefix="s3://my-bucket/redacted/", ) print(f"Batch ID: {batch.batch_id}") print(f"Jobs created: {batch.total_jobs}") # Check batch progress status = client.batch.get(batch.batch_id) print(f"Completed: {status.completed_jobs}/{status.total_jobs}")

Policy-based redaction

Policies let you save reusable redaction configurations. A policy defines which entity types to detect and what action to take for each one (redact, mask, pseudonymize, or allow).

Create a policy

from pii_redactor_sdk import PiiRedactor client = PiiRedactor(api_key="pk_live_...") policy = client.policies.create( name="customer-support", description="Redact PII in customer support transcripts", entity_actions={ "PERSON": "pseudonymize", "EMAIL_ADDRESS": "redact", "PHONE_NUMBER": "mask", "CREDIT_CARD": "redact", "LOCATION": "allow", }, ) print(f"Policy ID: {policy.policy_id}")

Use a policy for redaction

result = client.redact.text( text="Jane Doe called from 555-0123 about order #789", policy_id=policy.policy_id, ) print(result.redacted_text) # "Alice Johnson called from ***-**** about order #789"

Multi-language redaction

Expunct supports detection in multiple languages. Specify the language parameter to optimize detection for a particular language.

Currently supported languages:

  • en — English (default)
  • es — Spanish
from pii_redactor_sdk import PiiRedactor client = PiiRedactor(api_key="pk_live_...") # Spanish text result = client.redact.text( text="El paciente Juan Garcia, DNI 12345678A, vive en Madrid", language="es", ) print(result.redacted_text) # "El paciente [PERSON], DNI [US_SSN], vive en [LOCATION]"

The language parameter can also be used with file and batch redaction. If omitted, the default language is English.