Skip to main content

Anthropic

This provider supports the Anthropic Claude series of models.

Note: Anthropic models can also be accessed through Azure AI Foundry, AWS Bedrock, and Google Vertex.

Agentic Evals

For agentic evaluations with file access, tool use, and MCP servers, see the Claude Agent SDK provider.

Setup

To use Anthropic, you need to set the ANTHROPIC_API_KEY environment variable or specify the apiKey in the provider configuration.

Create Anthropic API keys here.

Example of setting the environment variable:

export ANTHROPIC_API_KEY=your_api_key_here

Authenticating via a Claude Code session

If you already have an active Claude Code session (for example as a Claude Pro or Max subscriber), you can reuse its OAuth credential instead of creating a separate Anthropic Console API key. Set apiKeyRequired: false on the provider config:

providers:
- id: anthropic:messages:claude-sonnet-4-6
config:
apiKeyRequired: false

When apiKeyRequired is false and no ANTHROPIC_API_KEY is available, Promptfoo loads the Claude Code OAuth credential from:

  1. The macOS keychain entry Claude Code-credentials (darwin only), then
  2. $HOME/.claude/.credentials.json on Linux and macOS, or %USERPROFILE%\.claude\.credentials.json on Windows.

Promptfoo authenticates requests with a Bearer token, sends the claude-code-20250219,oauth-2025-04-20 beta headers, and prepends the required Claude Code identity system block ("You are Claude Code, Anthropic's official CLI for Claude.") to every Messages request. Your own system prompt is still forwarded as the next system block.

If you haven't logged in yet, run claude /login to create a credential. Re-run it if Promptfoo warns that the credential has expired. Requests made this way are expected to count against your Claude subscription the same way calls from the Claude Code CLI do — check Anthropic's documentation for current billing behavior.

This also enables model-graded assertions such as llm-rubric to run without a separate Anthropic Console key — see the example below.

Models

The anthropic provider supports the following models via the messages API:

Model IDDescription
anthropic:messages:claude-opus-4-7Claude 4.7 Opus
anthropic:messages:claude-sonnet-4-6Claude 4.6 Sonnet
anthropic:messages:claude-opus-4-6Claude 4.6 Opus
anthropic:messages:claude-opus-4-5-20251101 (claude-opus-4-5-latest)Claude 4.5 Opus
anthropic:messages:claude-opus-4-1-20250805 (claude-opus-4-1-latest)Claude 4.1 Opus
anthropic:messages:claude-opus-4-20250514 (claude-opus-4-latest)Claude 4 Opus
anthropic:messages:claude-sonnet-4-5-20250929 (claude-sonnet-4-5-latest)Claude 4.5 Sonnet
anthropic:messages:claude-sonnet-4-20250514 (claude-sonnet-4-latest)Claude 4 Sonnet
anthropic:messages:claude-haiku-4-5-20251001 (claude-haiku-4-5-latest)Claude 4.5 Haiku
anthropic:messages:claude-3-7-sonnet-20250219 (claude-3-7-sonnet-latest)Claude 3.7 Sonnet
anthropic:messages:claude-3-5-sonnet-20241022 (claude-3-5-sonnet-latest)Claude 3.5 Sonnet (v2)
anthropic:messages:claude-3-5-sonnet-20240620Claude 3.5 Sonnet (v1)
anthropic:messages:claude-3-5-haiku-20241022 (claude-3-5-haiku-latest)Claude 3.5 Haiku
anthropic:messages:claude-3-opus-20240229 (claude-3-opus-latest)Claude 3 Opus
anthropic:messages:claude-3-haiku-20240307Claude 3 Haiku

Cross-Platform Model Availability

Claude models are available across multiple platforms. Here's how the model names map across different providers:

ModelAnthropic APIAzure AI Foundry (docs)AWS Bedrock (docs)GCP Vertex AI (docs)
Claude 4.7 Opusclaude-opus-4-7claude-opus-4-7anthropic.claude-opus-4-7claude-opus-4-7
Claude 4.6 Sonnetclaude-sonnet-4-6claude-sonnet-4-6anthropic.claude-sonnet-4-6claude-sonnet-4-6
Claude 4.6 Opusclaude-opus-4-6claude-opus-4-6-20260205anthropic.claude-opus-4-6-v1claude-opus-4-6
Claude 4.5 Opusclaude-opus-4-5-20251101 (claude-opus-4-5-latest)claude-opus-4-5-20251101anthropic.claude-opus-4-5-20251101-v1:0claude-opus-4-5@20251101
Claude 4.5 Sonnetclaude-sonnet-4-5-20250929 (claude-sonnet-4-5-latest)claude-sonnet-4-5-20250929anthropic.claude-sonnet-4-5-20250929-v1:0claude-sonnet-4-5@20250929
Claude 4.5 Haikuclaude-haiku-4-5-20251001 (claude-haiku-4-5-latest)claude-haiku-4-5-20251001anthropic.claude-haiku-4-5-20251001-v1:0claude-haiku-4-5@20251001
Claude 4.1 Opusclaude-opus-4-1-20250805claude-opus-4-1-20250805anthropic.claude-opus-4-1-20250805-v1:0claude-opus-4-1@20250805
Claude 4 Opusclaude-opus-4-20250514 (claude-opus-4-latest)claude-opus-4-20250514anthropic.claude-opus-4-20250514-v1:0claude-opus-4@20250514
Claude 4 Sonnetclaude-sonnet-4-20250514 (claude-sonnet-4-latest)claude-sonnet-4-20250514anthropic.claude-sonnet-4-20250514-v1:0claude-sonnet-4@20250514
Claude 3.7 Sonnetclaude-3-7-sonnet-20250219 (claude-3-7-sonnet-latest)claude-3-7-sonnet-20250219anthropic.claude-3-7-sonnet-20250219-v1:0claude-3-7-sonnet@20250219
Claude 3.5 Sonnetclaude-3-5-sonnet-20241022 (claude-3-5-sonnet-latest)claude-3-5-sonnet-20241022anthropic.claude-3-5-sonnet-20241022-v2:0claude-3-5-sonnet-v2@20241022
Claude 3.5 Haikuclaude-3-5-haiku-20241022 (claude-3-5-haiku-latest)claude-3-5-haiku-20241022anthropic.claude-3-5-haiku-20241022-v1:0claude-3-5-haiku@20241022
Claude 3 Opusclaude-3-opus-20240229 (claude-3-opus-latest)claude-3-opus-20240229anthropic.claude-3-opus-20240229-v1:0claude-3-opus@20240229
Claude 3 Haikuclaude-3-haiku-20240307claude-3-haiku-20240307anthropic.claude-3-haiku-20240307-v1:0claude-3-haiku@20240307

Supported Parameters

Config PropertyEnvironment VariableDescription
apiKeyANTHROPIC_API_KEYYour API key from Anthropic
apiKeyRequired-Skip the API key preflight and authenticate via a local Claude Code session
apiBaseUrlANTHROPIC_BASE_URLThe base URL for requests to the Anthropic API
temperatureANTHROPIC_TEMPERATUREControls the randomness of the output (default: 0). Omitted when top_p is set.
max_tokensANTHROPIC_MAX_TOKENSThe maximum length of the generated text (default: 1024)
cost-Legacy per-token override applied to both input and output pricing
inputCost-Override input token pricing in promptfoo cost estimates
outputCost-Override output token pricing in promptfoo cost estimates
top_p-Controls nucleus sampling. Mutually exclusive with temperature.
top_k-Only sample from the top K options for each subsequent token
stop_sequences-Array of strings that will stop generation when encountered
stream-Enable streaming (required when max_tokens > 21,333)
tools-An array of tool or function definitions for the model to call
tool_choice-An object specifying the tool to call
effort-Output effort level: low, medium, high, xhigh, or max
output_format-JSON schema configuration for structured outputs
thinking-Configuration for Claude's extended thinking (enabled, adaptive, or disabled)
showThinking-Whether to include thinking content in the output (default: true)
cache_control-Auto-apply cache_control to the last cacheable block in the request
metadata-Request metadata such as user_id for tracking purposes
service_tier-Priority tier: auto (default) or standard_only
headers-Additional headers to be sent with the API request
extra_body-Additional parameters to be included in the API request body

Prompt Template

To allow for compatibility with the OpenAI prompt template, the following format is supported:

prompt.json
[
{
"role": "system",
"content": "{{ system_message }}"
},
{
"role": "user",
"content": "{{ question }}"
}
]

If the role system is specified, it will be automatically added to the API request. All user or assistant roles will be automatically converted into the right format for the API request. Currently, only type text is supported.

The system_message and question are example variables that can be set with the var directive.

Options

The Anthropic provider supports several options to customize the behavior of the model. These include:

  • temperature: Controls the randomness of the output.
  • max_tokens: The maximum length of the generated text.
  • top_p: Controls nucleus sampling, affecting the randomness of the output.
  • top_k: Only sample from the top K options for each subsequent token.
  • tools: An array of tool or function definitions for the model to call.
  • tool_choice: An object specifying the tool to call.
  • stop_sequences: An array of strings that stop generation when encountered.
  • metadata: Request metadata (e.g., user_id) passed to the API.
  • extra_body: Additional parameters to pass directly to the Anthropic API request body.

Example configuration with options and prompts:

promptfooconfig.yaml
providers:
- id: anthropic:messages:claude-sonnet-4-5-20250929
config:
temperature: 0.0
max_tokens: 512
extra_body:
custom_param: 'test_value'
prompts:
- file://prompt.json

Stop Sequences

Use stop_sequences to halt generation when Claude encounters specific strings:

providers:
- id: anthropic:messages:claude-sonnet-4-5-20250929
config:
stop_sequences:
- "\n\nHuman:"
- 'STOP'

Metadata

Pass request metadata to the API for tracking or auditing purposes:

providers:
- id: anthropic:messages:claude-sonnet-4-5-20250929
config:
metadata:
user_id: 'user-123'

Tool Calling

The Anthropic provider supports tool calling (function calling). Here's an example configuration for defining tools.

promptfooconfig.yaml
providers:
- id: anthropic:messages:claude-sonnet-4-5-20250929
config:
tools:
- name: get_weather
description: Get the current weather in a given location
input_schema:
type: object
properties:
location:
type: string
description: The city and state, e.g., San Francisco, CA
unit:
type: string
enum:
- celsius
- fahrenheit
required:
- location

Web Search and Web Fetch Tools

Anthropic provides specialized tools for web search and web fetching capabilities:

Web Fetch Tool

The web fetch tool allows Claude to retrieve full content from web pages and PDF documents. This is useful when you want Claude to access and analyze specific web content.

providers:
- id: anthropic:messages:claude-sonnet-4-5-20250929
config:
tools:
- type: web_fetch_20250910
name: web_fetch
max_uses: 5
allowed_domains:
- docs.example.com
- help.example.com
citations:
enabled: true
max_content_tokens: 50000

Promptfoo also supports the stable web_fetch_20260209 variant. A newer version web_fetch_20260309 adds use_cache support for controlling whether cached content is used:

providers:
- id: anthropic:messages:claude-sonnet-4-5-20250929
config:
tools:
- type: web_fetch_20260209
name: web_fetch
max_uses: 3
defer_loading: true
- type: web_fetch_20260309
name: web_fetch
max_uses: 3
use_cache: false # Bypass cache for fresh content

Web Fetch Tool Configuration Options:

ParameterTypeDescription
typestringweb_fetch_20250910 (beta), web_fetch_20260209, or web_fetch_20260309 (adds use_cache)
namestringMust be web_fetch
max_usesnumberMaximum number of web fetches per request (optional)
allowed_callersstring[]Restrict which tool callers may invoke the server tool (optional)
allowed_domainsstring[]List of domains to allow fetching from (optional, mutually exclusive with blocked_domains)
blocked_domainsstring[]List of domains to block fetching from (optional, mutually exclusive with allowed_domains)
defer_loadingbooleanLoad the tool lazily instead of including it in the initial system prompt (optional)
citationsobjectEnable citations with { enabled: true } (optional)
max_content_tokensnumberMaximum tokens for web content (optional)
cache_controlobjectApply Anthropic cache control to the tool definition (optional)
strictbooleanEnable strict schema validation for tool names and inputs (optional)
use_cachebooleanWhether to use cached content (web_fetch_20260309 only, optional)
Web Search Tool

The web search tool allows Claude to search the internet for information:

providers:
- id: anthropic:messages:claude-sonnet-4-5-20250929
config:
tools:
- type: web_search_20260209
name: web_search
max_uses: 3

Web Search Tool Configuration Options:

ParameterTypeDescription
typestringweb_search_20250305 (beta) or web_search_20260209
namestringMust be web_search
max_usesnumberMaximum number of searches per request (optional)
allowed_callersstring[]Restrict which tool callers may invoke the server tool (optional)
allowed_domainsstring[]Restrict results to specific domains (optional, mutually exclusive with blocked_domains)
blocked_domainsstring[]Exclude domains from results (optional, mutually exclusive with allowed_domains)
cache_controlobjectApply Anthropic cache control to the tool definition (optional)
defer_loadingbooleanLoad the tool lazily instead of including it in the initial system prompt (optional)
strictbooleanEnable strict schema validation for tool names and inputs (optional)
user_locationobjectApproximate user location to improve search relevance (optional)
Combined Web Search and Web Fetch

You can use both tools together for comprehensive web information gathering:

providers:
- id: anthropic:messages:claude-sonnet-4-5-20250929
config:
tools:
- type: web_search_20260209
name: web_search
max_uses: 3
- type: web_fetch_20260309
name: web_fetch
max_uses: 5
citations:
enabled: true

This configuration allows the model to first search for relevant information, then fetch full content from the most promising results.

Memory Tool

Anthropic's memory_20250818 tool can be included in tools. Promptfoo passes this native tool definition through unchanged, which is useful for evaluating whether a model requests memory operations. Promptfoo does not manage Anthropic memory stores or run local memory handlers for you.

providers:
- id: anthropic:messages:claude-sonnet-4-6
config:
tools:
- type: memory_20250818
name: memory
allowed_callers:
- direct

Memory Tool Configuration Options:

ParameterTypeDescription
typestringMust be memory_20250818
namestringMust be memory
allowed_callersstring[]Restrict which tool callers may invoke the memory tool (optional)
cache_controlobjectApply Anthropic cache control to the tool definition (optional)
defer_loadingbooleanLoad the tool lazily instead of including it in the initial prompt
input_examplesobject[]Example memory commands to include in the tool definition (optional)
strictbooleanEnable strict schema validation for tool names and inputs (optional)

Important Security Notes:

  • The web fetch tool requires trusted environments due to potential data exfiltration risks
  • The model cannot dynamically construct URLs - only URLs provided by users or from search results can be fetched
  • Use domain filtering to restrict access to specific sites:
    • Use allowed_domains to whitelist trusted domains (recommended)
    • Use blocked_domains to blacklist specific domains
    • Note: Only one of allowed_domains or blocked_domains can be specified, not both

See the Anthropic Tool Use Guide for more information on how to define tools and the tool use example here.

Images / Vision

You can include images in the prompts in Claude 3 models.

See the Claude vision example.

One important note: The Claude API only supports base64 representations of images. This is different from how OpenAI's vision works, as it supports grabbing images from a URL. As a result, if you are trying to compare Claude 3 and OpenAI vision capabilities, you will need to have separate prompts for each.

See the OpenAI vision example to understand the differences.

Prompt Caching

Claude supports prompt caching to optimize API usage and reduce costs for repetitive tasks. This feature caches portions of your prompts to avoid reprocessing identical content in subsequent requests.

Supported on all Claude 3, 3.5, and 4 models. Basic example:

promptfooconfig.yaml
providers:
- id: anthropic:messages:claude-sonnet-4-5-20250929
prompts:
- file://prompts.yaml
prompts.yaml
- role: system
content:
- type: text
text: 'System message'
cache_control:
type: ephemeral
- type: text
text: '{{context}}'
cache_control:
type: ephemeral
- role: user
content: '{{question}}'

As a simpler alternative, use the top-level cache_control parameter to automatically apply a cache marker to the last cacheable block in the request, without annotating each block individually:

providers:
- id: anthropic:messages:claude-sonnet-4-5-20250929
config:
cache_control:
type: ephemeral

Common use cases for caching:

  • System messages and instructions
  • Tool/function definitions
  • Large context documents
  • Frequently used images

Cache read and creation token counts are tracked in the response's token usage details.

See Anthropic's Prompt Caching Guide for more details on requirements, pricing, and best practices.

Citations

Claude can provide detailed citations when answering questions about documents. Basic example:

promptfooconfig.yaml
providers:
- id: anthropic:messages:claude-sonnet-4-5-20250929
prompts:
- file://prompts.yaml
prompts.yaml
- role: user
content:
- type: document
source:
type: text
media_type: text/plain
data: 'Your document text here'
citations:
enabled: true
- type: text
text: 'Your question here'

See Anthropic's Citations Guide for more details.

PDF Documents

Claude can process PDF files using document content blocks. Pass the PDF as base64-encoded data:

- role: user
content:
- type: document
source:
type: base64
media_type: application/pdf
data: '{{pdf_base64}}'
- type: text
text: 'Summarize this document'

Use a test var to supply the base64-encoded PDF content:

tests:
- vars:
pdf_base64: file://document.pdf

Claude Opus 4.7 notes

Opus 4.7 is designed around adaptive thinking and runs with the reasoning stack always on. Promptfoo handles the key differences from earlier Opus models automatically:

  • Temperature is managed for you. Opus 4.7 samples adaptively and does not accept temperature; promptfoo omits the field from every request. Passing temperature in config or ANTHROPIC_TEMPERATURE logs a one-time heads-up so you can clean the value out of your eval.
  • Adaptive thinking is the default. Use thinking: { type: 'adaptive' } (or leave thinking unset) to let the model choose how much to reason per request. Budget-based modes from older models aren't used on 4.7.
  • xhigh effort level is available. It sits between high and max and is a good starting point for coding and agentic tasks. See the Effort Level section.
  • Updated tokenizer. The same input can map to 1.0–1.35× more tokens than Opus 4.6, so measure real traffic if you're comparing costs.

The same guidance applies when you reach Opus 4.7 through AWS Bedrock, GCP Vertex, or Azure AI Foundry — promptfoo suppresses temperature on each of those paths as well.

Extended Thinking

Claude supports an extended thinking capability that allows you to see the model's internal reasoning process before it provides the final answer. This can be configured using the thinking parameter:

promptfooconfig.yaml
providers:
# Adaptive thinking (recommended for Claude Opus 4.7)
- id: anthropic:messages:claude-opus-4-7
config:
max_tokens: 20000
thinking:
type: 'adaptive'

# Enabled thinking with explicit budget
- id: anthropic:messages:claude-sonnet-4-5-20250929
config:
max_tokens: 20000
thinking:
type: 'enabled'
budget_tokens: 16000 # Must be ≥1024 and less than max_tokens

The thinking configuration has three possible values:

  1. Adaptive thinking (recommended for Claude Opus 4.7):
thinking:
type: 'adaptive'

In adaptive mode, Claude decides when and how much to think based on the complexity of the request. This is the recommended mode for claude-opus-4-7.

  1. Enabled thinking:
thinking:
type: 'enabled'
budget_tokens: number # Must be ≥1024 and less than max_tokens
  1. Disabled thinking:
thinking:
type: 'disabled'

The display field controls how thinking content is returned:

  • 'summarized' (default) - thinking content is included in the response
  • 'omitted' - thinking content is redacted but a signature is returned for multi-turn continuity (saves tokens)
thinking:
type: enabled
budget_tokens: 10000
display: omitted

When thinking is enabled or adaptive:

  • Responses will include thinking content blocks showing Claude's reasoning process
  • Requires a minimum budget of 1,024 tokens
  • The budget_tokens value must be less than the max_tokens parameter
  • The tokens used for thinking count towards your max_tokens limit
  • A specialized 28 or 29 token system prompt is automatically included
  • Previous turn thinking blocks are ignored and not counted as input tokens
  • temperature and top_k are incompatible with thinking and will be omitted with a warning
  • top_p is clamped to the range [0.95, 1.0] when thinking is enabled
  • Forced tool use (tool_choice type any or tool) is incompatible with thinking and will be omitted with a warning; use auto instead

Example response with thinking enabled:

{
"content": [
{
"type": "thinking",
"thinking": "Let me analyze this step by step...",
"signature": "WaUjzkypQ2mUEVM36O2TxuC06KN8xyfbJwyem2dw3URve/op91XWHOEBLLqIOMfFG/UvLEczmEsUjavL...."
},
{
"type": "text",
"text": "Based on my analysis, here is the answer..."
}
]
}

Controlling Thinking Output

By default, thinking content is included in the response output. You can control this behavior using the showThinking parameter:

promptfooconfig.yaml
providers:
- id: anthropic:messages:claude-sonnet-4-5-20250929
config:
thinking:
type: 'enabled'
budget_tokens: 16000
showThinking: false # Exclude thinking content from the output

When showThinking is set to false, the thinking content will be excluded from the output, and only the final response will be returned. This is useful when you want to use thinking for better reasoning but don't want to expose the thinking process to end users.

Redacted Thinking

Sometimes Claude's internal reasoning may be flagged by safety systems. When this occurs, the thinking block will be encrypted and returned as a redacted_thinking block:

{
"content": [
{
"type": "redacted_thinking",
"data": "EmwKAhgBEgy3va3pzix/LafPsn4aDFIT2Xlxh0L5L8rLVyIwxtE3rAFBa8cr3qpP..."
},
{
"type": "text",
"text": "Based on my analysis..."
}
]
}

Redacted thinking blocks are automatically decrypted when passed back to the API, allowing Claude to maintain context without compromising safety guardrails.

Extended Output with Thinking

Claude 4 models provide enhanced output capabilities and extended thinking support:

providers:
- id: anthropic:messages:claude-sonnet-4-5-20250929
config:
max_tokens: 64000 # Claude 4 Sonnet supports up to 64K output tokens
thinking:
type: 'enabled'
budget_tokens: 32000

Note: The output-128k-2025-02-19 beta feature is specific to Claude 3.7 Sonnet and is not needed for Claude 4 models, which have improved output capabilities built-in.

When using extended output:

  • Streaming is required when max_tokens is greater than 21,333
  • For thinking budgets above 32K, batch processing is recommended
  • The model may not use the entire allocated thinking budget

See Anthropic's Extended Thinking Guide for more details on requirements and best practices.

Effort Level

The effort parameter controls the output quality/speed tradeoff. Higher effort levels may produce more thorough responses but take longer:

providers:
- id: anthropic:messages:claude-opus-4-7
config:
effort: xhigh # Options: low, medium, high, xhigh, max

Claude Opus 4.7 introduces the xhigh level between high and max, giving finer control over reasoning/latency on hard problems. For coding and agentic use cases, Anthropic recommends starting with high or xhigh.

This can be combined with other features like structured outputs:

providers:
- id: anthropic:messages:claude-opus-4-7
config:
effort: high
output_format:
type: json_schema
schema:
type: object
properties:
analysis:
type: string
required:
- analysis
additionalProperties: false

Structured Outputs

Structured outputs constrain Claude's responses to a JSON schema. Supported on Claude Opus 4.7, Opus 4.6, Sonnet 4.6, and Sonnet 4.5+ / Opus 4.1+.

JSON Outputs

Add output_format to get structured responses:

providers:
- id: anthropic:messages:claude-sonnet-4-5-20250929
config:
output_format:
type: json_schema
schema:
type: object
properties:
name:
type: string
email:
type: string
required:
- name
- email
additionalProperties: false

You can also load the entire output_format from an external file:

config:
output_format: file://./schemas/analysis-format.json

Nested file references are supported for the schema:

analysis-format.json
{
"type": "json_schema",
"schema": "file://./schemas/analysis-schema.json"
}

Variable rendering is supported in file paths:

config:
output_format: file://./schemas/{{ schema_name }}.json

Strict Tool Use

Add strict: true to tool definitions for schema-validated parameters:

providers:
- id: anthropic:messages:claude-sonnet-4-5-20250929
config:
tools:
- name: get_weather
strict: true
input_schema:
type: object
properties:
location:
type: string
required:
- location
additionalProperties: false

Limitations

Supported: object, array, string, integer, number, boolean, null, enum, required, additionalProperties: false

Not supported: recursive schemas, minimum/maximum, minLength/maxLength

Incompatible with: citations, message prefilling

See Anthropic's guide and the structured outputs example.

Model-Graded Tests

Model-graded assertions such as factuality or llm-rubric will automatically use Anthropic as the grading provider if ANTHROPIC_API_KEY is set and OPENAI_API_KEY is not set.

If both API keys are present, OpenAI will be used by default. You can explicitly override the grading provider in your configuration.

Claude Pro/Max subscribers without a separate Anthropic Console key can wire up llm-rubric through a local Claude Code session by pointing the grader at anthropic:messages:<model> with apiKeyRequired: false:

defaultTest:
options:
provider:
id: anthropic:messages:claude-sonnet-4-6
config:
apiKeyRequired: false

See Authenticating via a Claude Code session above for how the credential is loaded and what beta headers Promptfoo sets.

Because of how model-graded evals are implemented, the model must support chat-formatted prompts (except for embedding or classification models).

You can override the grading provider in several ways:

  1. For all test cases using defaultTest:
promptfooconfig.yaml
defaultTest:
options:
provider: anthropic:messages:claude-sonnet-4-5-20250929
  1. For individual assertions:
assert:
- type: llm-rubric
value: Do not mention that you are an AI or chat assistant
provider:
id: anthropic:messages:claude-sonnet-4-5-20250929
config:
temperature: 0.0
  1. For specific tests:
tests:
- vars:
question: What is the capital of France?
options:
provider:
id: anthropic:messages:claude-sonnet-4-5-20250929
assert:
- type: llm-rubric
value: Answer should mention Paris

Additional Capabilities

  • Caching: Promptfoo caches previous LLM requests by default.
  • Token Usage Tracking: Provides detailed information on the number of tokens used in each request, aiding in usage monitoring and optimization.
  • Cost Calculation: Calculates the cost of each request based on the number of tokens generated and the specific model used.

See Also

Examples

We provide several example implementations demonstrating Claude's capabilities:

Core Features

Model Comparisons & Evaluations

Cloud Platform Integrations

Agentic Evaluations

  • Claude Agent SDK - For agentic evals with file access, tool use, and MCP servers

For more examples and general usage patterns, visit our examples directory on GitHub.