Main class for structured extraction using LLM-backed output.
Constructor
Distiller(
model_name: str = "openai/gpt-4.1-mini",
system_prompt: str = "Extract relevant information.",
response_schema: Type[BaseModel] = DefaultSchema,
engine: LLMEngine | None = None,
cache_path: str | None = None,
)
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
model_name |
str |
"openai/gpt-4.1-mini" |
LLM model identifier |
system_prompt |
str |
"Extract..." |
System prompt for extraction |
response_schema |
Type[BaseModel] |
DefaultSchema |
Pydantic schema for output |
engine |
LLMEngine | None |
None |
Custom LLM engine |
cache_path |
str | None |
None |
Path to SQLite cache |
Methods
distill
Extract structured data from a single chunk.
result = await distiller.distill(
query="What are the requirements?",
chunk_text="Students must complete 130 credits...",
)
print(result.summary)
distill_many
Batch extraction with concurrency control.
results = await distiller.distill_many(
query="Extract key points",
chunks=["chunk 1...", "chunk 2...", "chunk 3..."],
concurrency=5,
)
create_job
Deferred extraction for async processing.
job = distiller.create_job(
query="Extract citations",
chunk_text="The court held in Smith v. Jones...",
)
result = await job.run()