@outputai/llm - Output Framework

The @outputai/llm package is how you call LLMs from your steps and evaluators. It wraps the AI SDK and adds prompt files — version-controlled .prompt files that live alongside your code and define the provider, model, temperature, and prompt template in one place.

Generate Functions

generateText is the primary function for LLM calls. Use the output parameter with Output.* helpers to control the response shape. Use streamText for streaming responses:

Output Shape	How	Use when you need
Unstructured text	`generateText({ prompt })`	Summaries, emails, explanations
Streamed text	`streamText({ prompt })`	Real-time output, long responses, UX responsiveness
Typed object	`generateText({ prompt, output: Output.object({ schema }) })`	Structured data, evaluator judgments
Array of objects	`generateText({ prompt, output: Output.array({ element }) })`	Lists, multiple items
One of N choices	`generateText({ prompt, output: Output.choice({ options }) })`	Classification, routing
Image	`generateImage({ prompt })`	Text-to-image, image-to-image, and image edits

Text Output

Generate unstructured text from a prompt file:

steps.ts

import { step } from '@outputai/core';
import { generateText } from '@outputai/llm';
import { GenerateSummaryInput, GenerateSummaryOutput } from './types.js';

export const generateSummary = step({
  name: 'generateSummary',
  description: 'Generate a company summary from research data',
  inputSchema: GenerateSummaryInput,
  outputSchema: GenerateSummaryOutput,
  fn: async (input) => {
    const { result } = await generateText({
      prompt: 'generate_summary@v1',
      variables: {
        companyName: input.name,
        industry: input.industry,
        size: input.size
      }
    });

    return result;
  }
});

// types.ts
// import { z } from '@outputai/core';
//
// export const GenerateSummaryInput = z.object({
//   name: z.string(),
//   industry: z.string(),
//   size: z.number()
// });
//
// export const GenerateSummaryOutput = z.string();

result is a convenience alias for response.text.

Streaming

Stream text from a prompt file. Unlike generateText, streamText returns immediately with a stream result — properties like text, usage, and finishReason are promises that resolve when the stream completes.

steps.ts

import { step } from '@outputai/core';
import { streamText } from '@outputai/llm';
import { GenerateContentInput, GenerateContentOutput } from './types.js';

export const generateContent = step({
  name: 'generateContent',
  description: 'Streams text generation and collects chunks',
  inputSchema: GenerateContentInput,
  outputSchema: GenerateContentOutput,
  fn: async ({ topic }) => {
    const result = streamText({
      prompt: 'stream_content@v1',
      variables: { topic }
    });

    const chunks: string[] = [];
    for await (const chunk of result.textStream) {
      chunks.push(chunk);
    }

    const content = chunks.join('');
    return {
      content,
      chunkCount: chunks.length,
      avgChunkSize: Math.round(content.length / chunks.length)
    };
  }
});

Note that streamText is not async — it returns a stream result synchronously. Iterate textStream to process chunks as they arrive. You can also await result.text to get the full text in one shot, but that collapses the stream and is functionally identical to generateText. To process chunks with side effects (e.g., writing to stdout):

const result = streamText({
  prompt: 'generate@v1',
  variables: { topic: 'AI safety' }
});

for await (const chunk of result.textStream) {
  process.stdout.write(chunk);
}

You can apply stream transforms like smoothStream for more natural output pacing:

import { streamText, smoothStream } from '@outputai/llm';

const result = streamText({
  prompt: 'generate@v1',
  variables: { topic },
  experimental_transform: smoothStream()
});

Use streaming callbacks for side effects without consuming the stream manually:

const result = streamText({
  prompt: 'generate@v1',
  variables: { topic },
  onChunk({ chunk }) {
    // Called for each chunk
  },
  onFinish({ text, usage }) {
    // Called when generation completes
  },
  onError({ error }) {
    // Called on stream error
  }
});

Object Output

Generate a structured object matching a Zod schema. This is what you’ll use most in evaluators:

evaluators.ts

import { evaluator, EvaluationBooleanResult } from '@outputai/core';
import { generateText, Output } from '@outputai/llm';
import { z } from '@outputai/core';
import { JudgeSummaryInput } from './types.js';

export const judgeSummaryQuality = evaluator({
  name: 'judgeSummaryQuality',
  description: 'Judge whether a company summary is accurate and useful',
  inputSchema: JudgeSummaryInput,
  fn: async (input) => {
    const { output } = await generateText({
      prompt: 'judge_summary@v1',
      variables: {
        summary: input.summary,
        companyName: input.companyName
      },
      output: Output.object({
        schema: z.object({
          reasoning: z.string(),
          passes: z.boolean(),
          confidence: z.number()
        })
      })
    });

    return new EvaluationBooleanResult({
      value: output.passes,
      confidence: output.confidence,
      reasoning: output.reasoning
    });
  }
});

// types.ts
// import { z } from '@outputai/core';
//
// export const JudgeSummaryInput = z.object({
//   summary: z.string(),
//   companyName: z.string()
// });

output contains the typed object matching your schema.

Image Output

Generate images from a prompt file with generateImage. Image prompt files use plain instructions, not chat role tags like <system> or <user>:

prompts/nascar_race@v1.prompt

---
provider: openai
model: gpt-image-1
size: 1024x1024
n: 1
providerOptions:
  openai:
    quality: high
---

Create a cinematic motorsport image.

Scene:
{{ scene }}

Call generateImage from a step:

steps.ts

import { randomUUID } from 'node:crypto';
import { mkdirSync, writeFileSync } from 'node:fs';
import { join } from 'node:path';
import { step, z } from '@outputai/core';
import { generateImage } from '@outputai/llm';

export const generateRaceImage = step({
  name: 'generateRaceImage',
  description: 'Generate a race image from a prompt file',
  inputSchema: z.object({
    scene: z.string()
  }),
  outputSchema: z.object({
    fileName: z.string()
  }),
  fn: async ({ scene }) => {
    const response = await generateImage({
      prompt: 'nascar_race@v1',
      variables: { scene }
    });

    if (!response.result?.base64) {
      throw new Error('Image generation did not return base64 image data.');
    }

    mkdirSync('_temp', { recursive: true });
    const fileName = `race-${randomUUID()}.png`;
    writeFileSync(join('_temp', fileName), Buffer.from(response.result.base64, 'base64'));

    return { fileName };
  }
});

result is a convenience alias for the first generated image (response.images[0]). The returned image exposes AI SDK image fields such as base64 and mediaType. For image-to-image or edit flows, pass runtime image inputs with images and optionally mask. Output forwards these to the AI SDK prompt object:

import { readFileSync } from 'node:fs';
import { generateImage } from '@outputai/llm';

const referenceImage = readFileSync('nascar-reference.jpg');

const response = await generateImage({
  prompt: 'nascar_race@v1',
  variables: {
    scene: 'Use the reference car as the hero car in a night race restart.'
  },
  images: [referenceImage],
  providerOptions: {
    openai: {
      background: 'auto',
      quality: 'high',
      output_format: 'png'
    }
  }
});

const image = response.result;

Supported image inputs follow the AI SDK shape: Buffer, Uint8Array, ArrayBuffer, raw base64 strings, or { data, mediaType } objects. mask uses the same input shape and requires images.

generateImage does not upload generated images, download remote images, or normalize provider-specific values like size: "auto". Download or upload files in your workflow/client code, pass image bytes to images, and pass concrete provider options through prompt front matter or providerOptions.

Common image options can live in prompt front matter:

Option	Description
`n`	Number of images to request when supported by the provider/model
`maxImagesPerCall`	AI SDK image batching limit
`size`	Concrete image size such as `1024x1024`
`aspectRatio`	Aspect ratio such as `1:1` or `16:9`
`seed`	Seed for deterministic output when supported
`providerOptions`	Provider-specific options, for example `openai.quality` or `vertex.imageConfig.imageSize`

Array Output

Generate an array of structured items:

import { generateText, Output } from '@outputai/llm';
import { z } from '@outputai/core';

const { output } = await generateText({
  prompt: 'extract_contacts@v1',
  variables: { companyData: JSON.stringify(company) },
  output: Output.array({
    element: z.object({
      name: z.string(),
      role: z.string(),
      email: z.string().optional()
    })
  })
});

// output is an array of { name, role, email } objects

Choice Output

Select one value from a set of options:

import { generateText, Output } from '@outputai/llm';

const { output } = await generateText({
  prompt: 'classify_lead@v1',
  variables: { activity: leadActivity },
  output: Output.choice({ options: ['hot', 'warm', 'cold', 'unknown'] })
});

// output is one of 'hot', 'warm', 'cold', 'unknown'

Agents

The Agent class wraps AI SDK’s ToolLoopAgent with Output prompt files and the skills system. Use it when you need multi-step tool execution, conversation history, or a reusable agent instance with a fixed configuration. For single-shot LLM calls without tools, generateText is simpler.

Construction

The prompt file is loaded and rendered at construction time. Variables, skills, and tools are fixed at construction. The agent is ready to call generate() or stream() immediately.

steps.ts

import { step } from '@outputai/core';
import { Agent, Output, skill } from '@outputai/llm';
import { z } from '@outputai/core';

const audienceSkill = skill({
  name: 'audience_adaptation',
  description: 'Tailor feedback for the specified expertise level',
  instructions: '# Audience Adaptation\n...'
});

export const reviewContent = step({
  name: 'reviewContent',
  description: 'Review content with structured feedback',
  inputSchema: ReviewContentInput,
  outputSchema: ReviewContentOutput,
  fn: async (input) => {
    const agent = new Agent({
      prompt: 'writing_assistant@v1',
      variables: {
        content_type: input.contentType,
        focus: input.focus,
        content: input.content
      },
      skills: [audienceSkill],
      output: Output.object({ schema: reviewSchema }),
      maxSteps: 5
    });
    const { output } = await agent.generate();
    return output;
  }
});

Constructor options:

Option	Type	Default	Description
`prompt`	`string`	(required)	Prompt file name (e.g. `'writing_assistant@v1'`)
`variables`	`Record<string, unknown>`	`{}`	Template variables rendered at construction
`skills`	`Skill[]`	`[]`	Skill packages for the LLM
`tools`	`ToolSet`	`{}`	AI SDK tools available during the loop
`maxSteps`	`number`	`10`	Maximum tool-loop iterations
`stopWhen`	`StopCondition`	-	Custom stop condition (overrides `maxSteps`)
`output`	`Output`	-	Structured output spec (e.g. `Output.object({ schema })`)
`conversationStore`	`ConversationStore`	-	Pluggable store for multi-turn history
`temperature`	`number`	-	Override prompt file temperature
`onStepFinish`	`Function`	-	Callback after each tool-loop step
`prepareStep`	`Function`	-	Customize each step before execution

generate()

Run the agent and return when complete:

const result = await agent.generate();
console.log(result.text);   // Generated text
console.log(result.output); // Structured output (when using Output.object)
console.log(result.usage);  // Token counts

The result has the same shape as generateText: text, result (alias for text), output, usage, finishReason, toolCalls, etc. Pass additional messages to extend the conversation:

const result = await agent.generate({
  messages: [{ role: 'user', content: 'Now focus on the introduction section.' }]
});

stream()

Stream the agent’s response:

const stream = await agent.stream();

for await (const chunk of stream.textStream) {
  process.stdout.write(chunk);
}

Like streamText, the stream result provides textStream and fullStream iterables, plus promise-based properties (text, usage, finishReason) that resolve on completion.

Structured Output

Use Output.object() with Agent to get typed responses:

steps.ts

import { Agent, Output } from '@outputai/llm';
import { z } from '@outputai/core';

const reviewSchema = z.object({
  issues: z.array(z.string()).describe('List of issues found'),
  suggestions: z.array(z.string()).describe('Actionable suggestions'),
  score: z.number().describe('Quality score 0-100'),
  summary: z.string().describe('Brief overall assessment')
});

const agent = new Agent({
  prompt: 'writing_assistant@v1',
  variables: { content_type: 'documentation', focus: 'clarity', content: markdownContent },
  output: Output.object({ schema: reviewSchema }),
  maxSteps: 5
});

const { output } = await agent.generate();
// output: { issues: string[], suggestions: string[], score: number, summary: string }

Conversation Store

By default, Agent is stateless. Each generate() call starts fresh with only the initial prompt messages. Pass a conversationStore to maintain history across calls:

import { Agent, createMemoryConversationStore } from '@outputai/llm';

const store = createMemoryConversationStore();
const chatbot = new Agent({
  prompt: 'chatbot@v1',
  conversationStore: store
});

const r1 = await chatbot.generate({
  messages: [{ role: 'user', content: 'Hello, tell me about Output.' }]
});
// r1.text: "Output is an AI framework for..."

const r2 = await chatbot.generate({
  messages: [{ role: 'user', content: 'How does it handle retries?' }]
});
// r2 sees the full conversation history from r1

For custom storage backends, implement the ConversationStore interface:

interface ConversationStore {
  getMessages(): ModelMessage[] | Promise<ModelMessage[]>;
  addMessages(messages: ModelMessage[]): void | Promise<void>;
}

createMemoryConversationStore() is the built-in in-memory implementation. For production, implement the interface with your database.

stream() does not automatically append messages to the conversation store. If you use streaming with a conversation store, persist messages manually in the onFinish callback.

When to Use Agent vs generateText

	`generateText`	`Agent`
Best for	Single-shot LLM calls	Multi-step tool loops
Tools	Supported	Supported
Skills	Supported	Supported
Conversation history	Manual	Built-in with `conversationStore`
Reusable instance	No (function call)	Yes (construct once, call many)
Structured output	`Output.object()`	`Output.object()`

Start with generateText. Move to Agent when you need conversation state or a reusable instance with a fixed configuration.

Response Object

generateText returns the full AI SDK response:

Field	Description
`result`	Convenience alias for `text`
`text`	The raw generated text
`output`	The structured output when using `Output.*` helpers
`usage`	Token counts: `inputTokens`, `outputTokens`, `totalTokens`
`finishReason`	Why generation stopped (`'stop'`, `'length'`, `'tool-calls'`, etc.)
`response`	Raw provider response metadata
`warnings`	Any warnings from the provider
`toolCalls`	Tool calls made by the model (when using tools)
`cost`	LLM usage attribute with `modelId`, `usage`, `total`, and `tokensUsed`; `null` when pricing could not be computed

The cost property is an LLM usage attribute:

{
  "type": "llm:usage",
  "modelId": "gpt-4o",
  "usage": [
    { "type": "input", "ppm": 5, "amount": 217, "total": 0.001085 },
    { "type": "output", "ppm": 15, "amount": 9, "total": 0.000135 }
  ],
  "total": 0.00122,
  "tokensUsed": 226
}

Only available, finite usage dimensions are included in usage. For example, reasoning is omitted when the model does not define separate reasoning pricing. Streaming response shape. streamText returns a different result type. Stream iterables (textStream, fullStream) provide real-time chunks, while scalar properties (text, usage, finishReason, etc.) are promises that resolve when the stream completes:

Field	Type	Description
`textStream`	`AsyncIterable<string>`	Async iterable of text chunks
`fullStream`	`AsyncIterable<TextStreamPart>`	Async iterable of all stream events (text deltas, tool calls, etc.)
`text`	`Promise<string>`	Full text, resolved on completion
`usage`	`Promise<LanguageModelUsage>`	Token counts, resolved on completion
`finishReason`	`Promise<FinishReason>`	Why generation stopped, resolved on completion
`toolCalls`	`Promise`	Tool calls made during streaming, resolved on completion
`response`	`Promise`	Raw provider response metadata
`warnings`	`Promise`	Any warnings from the provider

Prompt Files

Instead of hardcoding model config and messages in your code, you write .prompt files that live in your workflow’s prompts/ folder. See the Prompts Guide for the full documentation.

prompts/generate_summary@v1.prompt

---
provider: anthropic
model: claude-sonnet-4-20250514
temperature: 0.7
---

<system>
You write concise company summaries for sales teams.
</system>

<user>
Write a 2-3 paragraph summary of {{ companyName }}.

Industry: {{ industry }}
Company size: {{ size }} employees
</user>

Configuration Options

Option	Type	Description
`provider`	`string`	`anthropic`, `openai`, `azure`, `vertex`, `bedrock`, `perplexity`, or a provider registered with `registerProvider`
`model`	`string`	Model identifier
`temperature`	`number`	Sampling temperature (0.0-2.0)
`maxTokens`	`number`	Maximum output tokens
`tools`	`object`	Provider-specific tools (web search, etc.)
`providerOptions`	`object`	Provider-specific options — see ProviderOptions Guide

Providers

@outputai/llm ships built-in support for common AI SDK providers. The provider packages are peer dependencies with supported version ranges:

Prompt `provider`	Peer dependency	Supported range
`anthropic`	`@ai-sdk/anthropic`	`>=3 <4`
`openai`	`@ai-sdk/openai`	`>=3 <4`
`azure`	`@ai-sdk/azure`	`>=3 <4`
`vertex`	`@ai-sdk/google-vertex`	`>=4 <5`
`bedrock`	`@ai-sdk/amazon-bedrock`	`>=4 <5`
`perplexity`	`@ai-sdk/perplexity`	`>=3 <4`

Built-in provider instances are initialized lazily. Output creates the provider instance only when a prompt or API call first requests that provider, then reuses it for later calls.

Anthropic

---
provider: anthropic
model: claude-sonnet-4-20250514
---

Requires ANTHROPIC_API_KEY environment variable.

OpenAI

---
provider: openai
model: gpt-4o
---

Requires OPENAI_API_KEY environment variable.

Azure OpenAI

---
provider: azure
model: gpt-4o
---

Requires AZURE_OPENAI_API_KEY, AZURE_OPENAI_ENDPOINT, and AZURE_OPENAI_API_VERSION.

Vertex AI

---
provider: vertex
model: gemini-1.5-pro
---

Requires Google Cloud authentication and configuration.

Amazon Bedrock

---
provider: bedrock
model: anthropic.claude-sonnet-4-20250514-v1:0
---

Requires AWS credentials (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_REGION) or IAM role-based authentication. Set AWS_SESSION_TOKEN when using temporary credentials (e.g., from aws sts assume-role). For cross-region inference, use the regional inference profile format: us.anthropic.claude-sonnet-4-20250514-v1:0. Always set maxTokens in your Bedrock prompt files. Unlike the direct Anthropic provider (which auto-detects per-model limits), the Bedrock SDK has no client-side defaults and relies on server-side defaults that may be lower than the model’s capacity. When using providerOptions, use the bedrock namespace (not anthropic):

providerOptions:
  bedrock:
    guardrailConfig:
      guardrailIdentifier: my-guardrail
      guardrailVersion: "1"

Custom Providers

Use registerProvider when you want prompt files to reference an AI SDK provider that is not built in, or when you need a custom provider instance:

import { createVertexAnthropic } from '@ai-sdk/google-vertex/anthropic';
import { registerProvider } from '@outputai/llm';

registerProvider('vertex-anthropic', createVertexAnthropic({
  project: process.env.GOOGLE_VERTEX_PROJECT,
  location: process.env.GOOGLE_VERTEX_LOCATION
}));

Then use the registered provider name in prompt front matter:

---
provider: vertex-anthropic
model: claude-haiku-4-5
---

Built-in providers use Output’s default fetch configuration, including longer Undici response timeouts for LLM calls that take time before returning headers or body chunks. Custom registered providers are used exactly as you register them; they do not automatically receive that custom fetch. If your custom provider also needs longer network timeouts, configure its provider instance directly.

Prompt Caching

When a prompt sends the same large prefix on every call — a long system prompt, few-shot examples, a pasted reference document — you can cache that prefix so the provider skips reprocessing it. Cached input is about 90% cheaper and faster to first token. How you enable it depends on the provider.

Anthropic

Anthropic caches only what you explicitly mark. Define a cacheControl set in messageOptions and attach it — with options — to the block that ends your static prefix. Everything up to and including that block is cached and reused on the next call:

prompts/generate_summary@v1.prompt

---
provider: anthropic
model: claude-sonnet-4-20250514
messageOptions:
  cached:
    anthropic:
      cacheControl:
        type: ephemeral
---

<system options="cached">
You write concise company summaries for sales teams. Follow this style guide:
{{ style_guide }}
</system>

<user>
Summarize {{ companyName }}.
</user>

Only the <user> block — the part that changes each call — is re-billed at full price; the cached <system> prefix is charged at the much cheaper cache-read rate. For the 1-hour cache instead of the default 5 minutes, add ttl: 1h under cacheControl. A block can reference several sets (options="cached fast"), and a set can be reused across blocks.

Attach the set to the last static block, never one containing per-call {{ variables }}. A breakpoint on changing content rewrites the cache on every call and never gets a hit. Order your blocks static-first, dynamic-last.

Each set is a provider-namespaced providerOptions object — the same shape and namespace rules as call-level providerOptions. On Vertex with a Claude model, use the same anthropic namespace.

OpenAI

OpenAI caches automatically — there are no breakpoints to set, so the messageOptions mechanism above isn’t needed. Any prompt of 1024 tokens or longer is cached for you, with no markup. To improve hit rates across calls, set a stable promptCacheKey (and, on GPT-5.1+, extend retention) via providerOptions:

prompts/enrich_company@v1.prompt

---
provider: openai
model: gpt-5
providerOptions:
  openai:
    promptCacheKey: enrich-company-v1
    promptCacheRetention: 24h
---

<system>
{{ enrichment_playbook }}
</system>

<user>
Enrich {{ company }}.
</user>

Confirming a cache hit

Cache activity appears in the response usage and the cost event: the first call reports cache-creation tokens, and later calls within the TTL report cache-read tokens (cachedInputTokens), already priced at the cheaper rate in response.cost.

Anthropic caches only prefixes above a model-specific minimum — around 1,024 tokens for most Sonnet and Opus models, higher for some. Shorter prefixes are silently not cached, with no error. A request supports at most four cache breakpoints.

Provider Tools

Many providers offer built-in tools like web search. Configure them in YAML front matter:

prompts/research@v1.prompt

---
provider: vertex
model: gemini-2.0-flash
tools:
  googleSearch:
    mode: MODE_DYNAMIC
    dynamicThreshold: 0.8
---

<user>
Research {{ topic }} and provide sources
</user>

This is equivalent to calling vertex.tools.googleSearch({ mode: 'MODE_DYNAMIC', dynamicThreshold: 0.8 }) at the code level, but keeps your prompt self-contained. YAML tools are merged with code-level tools, so you can combine provider tools (from YAML) with custom tools (from code). Code-level tools take precedence if names conflict. For provider-specific tool options, see:

Tool Calling

Use tools with generateText to enable function calling:

import { generateText, tool } from '@outputai/llm';
import { z } from '@outputai/core';

const { result, toolCalls } = await generateText({
  prompt: 'agent@v1',
  variables: { task: 'Research competitor pricing' },
  tools: {
    searchWeb: tool({
      description: 'Search the web for information',
      parameters: z.object({ query: z.string() }),
      execute: async ({ query }) => fetchSearchResults(query)
    })
  },
  toolChoice: 'auto'
});

AI SDK Pass-Through Options

All generate functions accept additional AI SDK options passed through to the provider:

Option	Type	Description
`tools`	`ToolSet`	Tools the model can call (generateText and streamText)
`toolChoice`	`'auto' \| 'none' \| 'required'`	Tool selection strategy
`maxRetries`	`number`	Max retry attempts (default: 0)
`seed`	`number`	Seed for deterministic output
`abortSignal`	`AbortSignal`	Cancel the request
`topP`	`number`	Nucleus sampling (0-1)
`topK`	`number`	Top-K sampling
`onChunk`	`Function`	Callback for each stream chunk (streamText only)
`onFinish`	`Function`	Callback when stream completes (streamText only)
`onError`	`Function`	Callback on stream error (streamText only)
`experimental_transform`	`Function`	Stream transform, e.g. `smoothStream()` (streamText only)

Options set in the prompt file (temperature, maxTokens) can be overridden at call time.

Retries and Network Timeouts

generateText and streamText set maxRetries: 0 by default. In Output workflows, LLM calls usually run inside steps, and steps are Temporal activities. When a provider error is allowed to fail the step, Temporal records the failed activity attempt and retries it according to the workflow’s retry policy. AI SDK retries are still available, but they happen inside one activity attempt. Use them when you want quick provider-level retries before the step fails. Keep the default when you want Temporal to be the single place that controls retries. Pass maxRetries in the function call when you want the AI SDK to retry provider requests:

const result = await generateText({
  prompt: 'summarize@v1',
  maxRetries: 2
});

Built-in providers are initialized with a custom fetch that extends Undici’s headersTimeout and bodyTimeout to 15 minutes. This helps long-running LLM responses where the provider accepts the request but takes longer to return response headers or body chunks, for example reasoning-heavy calls. Active cancellation still works: if you pass abortSignal, or the AI SDK/provider aborts the request, that cancellation wins.

LLM call cost event

Each generateText and streamText call emits a cost:llm:request event after the LLM responds and cost can be computed. You can observe it with the same hooks mechanism as error hooks: register a handler with on('cost:llm:request', handler) from @outputai/core/hooks in a hook file listed under outputai.hookFiles. The handler receives an event envelope whose payload field is the same LLM usage attribute exposed on response.cost. For payload details, see Cost Events.

loadPrompt

Load and render a prompt file without generating — useful for debugging:

import { loadPrompt } from '@outputai/llm';

const prompt = loadPrompt('generate_summary@v1', {
  companyName: 'Acme Corp',
  industry: 'SaaS',
  size: 250
});

console.log(prompt.config);   // { provider: 'anthropic', model: '...', temperature: 0.7 }
console.log(prompt.messages);  // Rendered message array

API Reference

For complete TypeScript API documentation, see the LLM Module API Reference.

​Generate Functions

​Text Output

​Streaming

​Object Output

​Image Output

​Array Output

​Choice Output

​Agents

​Construction

​generate()

​stream()

​Structured Output

​Conversation Store

​When to Use Agent vs generateText

​Response Object

​Prompt Files

​Configuration Options

​Providers

​Anthropic

​OpenAI

​Azure OpenAI

​Vertex AI

​Amazon Bedrock

​Custom Providers

​Prompt Caching

​Anthropic

​OpenAI

​Confirming a cache hit

​Provider Tools

​Tool Calling

​AI SDK Pass-Through Options

​Retries and Network Timeouts

​LLM call cost event

​loadPrompt

​API Reference

Generate Functions

Text Output

Streaming

Object Output

Image Output

Array Output

Choice Output

Agents

Construction

generate()

stream()

Structured Output

Conversation Store

When to Use Agent vs generateText

Response Object

Prompt Files

Configuration Options

Providers

Anthropic

OpenAI

Azure OpenAI

Vertex AI

Amazon Bedrock

Custom Providers

Prompt Caching

Anthropic

OpenAI

Confirming a cache hit

Provider Tools

Tool Calling

AI SDK Pass-Through Options

Retries and Network Timeouts

LLM call cost event

loadPrompt

API Reference