@outputai/llm package is how you call LLMs from your steps and evaluators. It wraps the AI SDK and adds prompt files — version-controlled .prompt files that live alongside your code and define the provider, model, temperature, and prompt template in one place.
Generate Functions
generateText is the primary function for LLM calls. Use the output parameter with Output.* helpers to control the response shape. Use streamText for streaming responses:
| Output Shape | How | Use when you need |
|---|---|---|
| Unstructured text | generateText({ prompt }) | Summaries, emails, explanations |
| Streamed text | streamText({ prompt }) | Real-time output, long responses, UX responsiveness |
| Typed object | generateText({ prompt, output: Output.object({ schema }) }) | Structured data, evaluator judgments |
| Array of objects | generateText({ prompt, output: Output.array({ element }) }) | Lists, multiple items |
| One of N choices | generateText({ prompt, output: Output.choice({ options }) }) | Classification, routing |
| Image | generateImage({ prompt }) | Text-to-image, image-to-image, and image edits |
Text Output
Generate unstructured text from a prompt file:steps.ts
result is a convenience alias for response.text.
Streaming
Stream text from a prompt file. UnlikegenerateText, streamText returns immediately with a stream result — properties like text, usage, and finishReason are promises that resolve when the stream completes.
steps.ts
streamText is not async — it returns a stream result synchronously. Iterate textStream to process chunks as they arrive. You can also await result.text to get the full text in one shot, but that collapses the stream and is functionally identical to generateText.
To process chunks with side effects (e.g., writing to stdout):
smoothStream for more natural output pacing:
Object Output
Generate a structured object matching a Zod schema. This is what you’ll use most in evaluators:evaluators.ts
output contains the typed object matching your schema.
Image Output
Generate images from a prompt file withgenerateImage. Image prompt files use plain instructions, not chat role tags like <system> or <user>:
prompts/nascar_race@v1.prompt
generateImage from a step:
steps.ts
result is a convenience alias for the first generated image (response.images[0]). The returned image exposes AI SDK image fields such as base64 and mediaType.
For image-to-image or edit flows, pass runtime image inputs with images and optionally mask. Output forwards these to the AI SDK prompt object:
Buffer, Uint8Array, ArrayBuffer, raw base64 strings, or { data, mediaType } objects. mask uses the same input shape and requires images.
generateImage does not upload generated images, download remote images, or normalize provider-specific values like size: "auto". Download or upload files in your workflow/client code, pass image bytes to images, and pass concrete provider options through prompt front matter or providerOptions.| Option | Description |
|---|---|
n | Number of images to request when supported by the provider/model |
maxImagesPerCall | AI SDK image batching limit |
size | Concrete image size such as 1024x1024 |
aspectRatio | Aspect ratio such as 1:1 or 16:9 |
seed | Seed for deterministic output when supported |
providerOptions | Provider-specific options, for example openai.quality or vertex.imageConfig.imageSize |
Array Output
Generate an array of structured items:Choice Output
Select one value from a set of options:Agents
TheAgent class wraps AI SDK’s ToolLoopAgent with Output prompt files and the skills system. Use it when you need multi-step tool execution, conversation history, or a reusable agent instance with a fixed configuration. For single-shot LLM calls without tools, generateText is simpler.
Construction
The prompt file is loaded and rendered at construction time. Variables, skills, and tools are fixed at construction. The agent is ready to callgenerate() or stream() immediately.
steps.ts
| Option | Type | Default | Description |
|---|---|---|---|
prompt | string | (required) | Prompt file name (e.g. 'writing_assistant@v1') |
variables | Record<string, unknown> | {} | Template variables rendered at construction |
skills | Skill[] | [] | Skill packages for the LLM |
tools | ToolSet | {} | AI SDK tools available during the loop |
maxSteps | number | 10 | Maximum tool-loop iterations |
stopWhen | StopCondition | - | Custom stop condition (overrides maxSteps) |
output | Output | - | Structured output spec (e.g. Output.object({ schema })) |
conversationStore | ConversationStore | - | Pluggable store for multi-turn history |
temperature | number | - | Override prompt file temperature |
onStepFinish | Function | - | Callback after each tool-loop step |
prepareStep | Function | - | Customize each step before execution |
generate()
Run the agent and return when complete:generateText: text, result (alias for text), output, usage, finishReason, toolCalls, etc.
Pass additional messages to extend the conversation:
stream()
Stream the agent’s response:streamText, the stream result provides textStream and fullStream iterables, plus promise-based properties (text, usage, finishReason) that resolve on completion.
Structured Output
UseOutput.object() with Agent to get typed responses:
steps.ts
Conversation Store
By default, Agent is stateless. Eachgenerate() call starts fresh with only the initial prompt messages. Pass a conversationStore to maintain history across calls:
ConversationStore interface:
createMemoryConversationStore() is the built-in in-memory implementation. For production, implement the interface with your database.
stream() does not automatically append messages to the conversation store. If you use streaming with a conversation store, persist messages manually in the onFinish callback.When to Use Agent vs generateText
generateText | Agent | |
|---|---|---|
| Best for | Single-shot LLM calls | Multi-step tool loops |
| Tools | Supported | Supported |
| Skills | Supported | Supported |
| Conversation history | Manual | Built-in with conversationStore |
| Reusable instance | No (function call) | Yes (construct once, call many) |
| Structured output | Output.object() | Output.object() |
generateText. Move to Agent when you need conversation state or a reusable instance with a fixed configuration.
Response Object
generateText returns the full AI SDK response:
| Field | Description |
|---|---|
result | Convenience alias for text |
text | The raw generated text |
output | The structured output when using Output.* helpers |
usage | Token counts: inputTokens, outputTokens, totalTokens |
finishReason | Why generation stopped ('stop', 'length', 'tool-calls', etc.) |
response | Raw provider response metadata |
warnings | Any warnings from the provider |
toolCalls | Tool calls made by the model (when using tools) |
cost | LLM usage attribute with modelId, usage, total, and tokensUsed; null when pricing could not be computed |
cost property is an LLM usage attribute:
usage. For example, reasoning is omitted when the model does not define separate reasoning pricing.
Streaming response shape. streamText returns a different result type. Stream iterables (textStream, fullStream) provide real-time chunks, while scalar properties (text, usage, finishReason, etc.) are promises that resolve when the stream completes:
| Field | Type | Description |
|---|---|---|
textStream | AsyncIterable<string> | Async iterable of text chunks |
fullStream | AsyncIterable<TextStreamPart> | Async iterable of all stream events (text deltas, tool calls, etc.) |
text | Promise<string> | Full text, resolved on completion |
usage | Promise<LanguageModelUsage> | Token counts, resolved on completion |
finishReason | Promise<FinishReason> | Why generation stopped, resolved on completion |
toolCalls | Promise | Tool calls made during streaming, resolved on completion |
response | Promise | Raw provider response metadata |
warnings | Promise | Any warnings from the provider |
Prompt Files
Instead of hardcoding model config and messages in your code, you write.prompt files that live in your workflow’s prompts/ folder. See the Prompts Guide for the full documentation.
prompts/generate_summary@v1.prompt
Configuration Options
| Option | Type | Description |
|---|---|---|
provider | string | anthropic, openai, azure, vertex, bedrock, perplexity, or a provider registered with registerProvider |
model | string | Model identifier |
temperature | number | Sampling temperature (0.0-2.0) |
maxTokens | number | Maximum output tokens |
tools | object | Provider-specific tools (web search, etc.) |
providerOptions | object | Provider-specific options — see ProviderOptions Guide |
Providers
@outputai/llm ships built-in support for common AI SDK providers. The provider packages are peer dependencies with supported version ranges:
Prompt provider | Peer dependency | Supported range |
|---|---|---|
anthropic | @ai-sdk/anthropic | >=3 <4 |
openai | @ai-sdk/openai | >=3 <4 |
azure | @ai-sdk/azure | >=3 <4 |
vertex | @ai-sdk/google-vertex | >=4 <5 |
bedrock | @ai-sdk/amazon-bedrock | >=4 <5 |
perplexity | @ai-sdk/perplexity | >=3 <4 |
Anthropic
ANTHROPIC_API_KEY environment variable.
OpenAI
OPENAI_API_KEY environment variable.
Azure OpenAI
AZURE_OPENAI_API_KEY, AZURE_OPENAI_ENDPOINT, and AZURE_OPENAI_API_VERSION.
Vertex AI
Amazon Bedrock
AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_REGION) or IAM role-based authentication. Set AWS_SESSION_TOKEN when using temporary credentials (e.g., from aws sts assume-role).
For cross-region inference, use the regional inference profile format: us.anthropic.claude-sonnet-4-20250514-v1:0.
Always set maxTokens in your Bedrock prompt files. Unlike the direct Anthropic provider (which auto-detects per-model limits), the Bedrock SDK has no client-side defaults and relies on server-side defaults that may be lower than the model’s capacity.
When using providerOptions, use the bedrock namespace (not anthropic):
Custom Providers
UseregisterProvider when you want prompt files to reference an AI SDK provider that is not built in, or when you need a custom provider instance:
Prompt Caching
When a prompt sends the same large prefix on every call — a long system prompt, few-shot examples, a pasted reference document — you can cache that prefix so the provider skips reprocessing it. Cached input is about 90% cheaper and faster to first token. How you enable it depends on the provider.Anthropic
Anthropic caches only what you explicitly mark. Define acacheControl set in messageOptions and attach it — with options — to the block that ends your static prefix. Everything up to and including that block is cached and reused on the next call:
prompts/generate_summary@v1.prompt
<user> block — the part that changes each call — is re-billed at full price; the cached <system> prefix is charged at the much cheaper cache-read rate. For the 1-hour cache instead of the default 5 minutes, add ttl: 1h under cacheControl. A block can reference several sets (options="cached fast"), and a set can be reused across blocks.
Each set is a provider-namespaced providerOptions object — the same shape and namespace rules as call-level providerOptions. On Vertex with a Claude model, use the same anthropic namespace.
OpenAI
OpenAI caches automatically — there are no breakpoints to set, so themessageOptions mechanism above isn’t needed. Any prompt of 1024 tokens or longer is cached for you, with no markup. To improve hit rates across calls, set a stable promptCacheKey (and, on GPT-5.1+, extend retention) via providerOptions:
prompts/enrich_company@v1.prompt
Confirming a cache hit
Cache activity appears in the response usage and the cost event: the first call reports cache-creation tokens, and later calls within the TTL report cache-read tokens (cachedInputTokens), already priced at the cheaper rate in response.cost.
Anthropic caches only prefixes above a model-specific minimum — around 1,024 tokens for most Sonnet and Opus models, higher for some. Shorter prefixes are silently not cached, with no error. A request supports at most four cache breakpoints.
Provider Tools
Many providers offer built-in tools like web search. Configure them in YAML front matter:prompts/research@v1.prompt
vertex.tools.googleSearch({ mode: 'MODE_DYNAMIC', dynamicThreshold: 0.8 }) at the code level, but keeps your prompt self-contained.
YAML tools are merged with code-level tools, so you can combine provider tools (from YAML) with custom tools (from code). Code-level tools take precedence if names conflict.
For provider-specific tool options, see:
Tool Calling
Use tools withgenerateText to enable function calling:
AI SDK Pass-Through Options
All generate functions accept additional AI SDK options passed through to the provider:| Option | Type | Description |
|---|---|---|
tools | ToolSet | Tools the model can call (generateText and streamText) |
toolChoice | 'auto' | 'none' | 'required' | Tool selection strategy |
maxRetries | number | Max retry attempts (default: 0) |
seed | number | Seed for deterministic output |
abortSignal | AbortSignal | Cancel the request |
topP | number | Nucleus sampling (0-1) |
topK | number | Top-K sampling |
onChunk | Function | Callback for each stream chunk (streamText only) |
onFinish | Function | Callback when stream completes (streamText only) |
onError | Function | Callback on stream error (streamText only) |
experimental_transform | Function | Stream transform, e.g. smoothStream() (streamText only) |
Retries and Network Timeouts
generateText and streamText set maxRetries: 0 by default. In Output workflows, LLM calls usually run inside steps, and steps are Temporal activities. When a provider error is allowed to fail the step, Temporal records the failed activity attempt and retries it according to the workflow’s retry policy.
AI SDK retries are still available, but they happen inside one activity attempt. Use them when you want quick provider-level retries before the step fails. Keep the default when you want Temporal to be the single place that controls retries.
Pass maxRetries in the function call when you want the AI SDK to retry provider requests:
headersTimeout and bodyTimeout to 15 minutes. This helps long-running LLM responses where the provider accepts the request but takes longer to return response headers or body chunks, for example reasoning-heavy calls. Active cancellation still works: if you pass abortSignal, or the AI SDK/provider aborts the request, that cancellation wins.
LLM call cost event
EachgenerateText and streamText call emits a cost:llm:request event after the LLM responds and cost can be computed. You can observe it with the same hooks mechanism as error hooks: register a handler with on('cost:llm:request', handler) from @outputai/core/hooks in a hook file listed under outputai.hookFiles. The payload is the same LLM usage attribute exposed on response.cost. For payload details, see Cost Events.