Anthropic

This provider supports the Anthropic Claude series of models.

Note: Anthropic models can also be accessed through Azure AI Foundry, AWS Bedrock, and Google Vertex.

Agentic Evals

For agentic evaluations with file access, tool use, and MCP servers, see the Claude Agent SDK provider.

Setup

To use Anthropic, you need to set the ANTHROPIC_API_KEY environment variable or specify the apiKey in the provider configuration.

Create Anthropic API keys here.

Example of setting the environment variable:

export ANTHROPIC_API_KEY=your_api_key_here

Authenticating via a Claude Code session

If you already have an active Claude Code session (for example as a Claude Pro or Max subscriber), you can reuse its OAuth credential instead of creating a separate Anthropic Console API key. Set apiKeyRequired: false on the provider config:

providers:
  - id: anthropic:messages:claude-sonnet-4-6
    config:
      apiKeyRequired: false

When apiKeyRequired is false and no ANTHROPIC_API_KEY is available, Promptfoo loads the Claude Code OAuth credential from:

The macOS keychain entry Claude Code-credentials (darwin only), then
$HOME/.claude/.credentials.json on Linux and macOS, or %USERPROFILE%\.claude\.credentials.json on Windows.

Promptfoo authenticates requests with a Bearer token, sends the claude-code-20250219,oauth-2025-04-20 beta headers, and prepends the required Claude Code identity system block ("You are Claude Code, Anthropic's official CLI for Claude.") to every Messages request. Your own system prompt is still forwarded as the next system block.

If you haven't logged in yet, run claude /login to create a credential. Re-run it if Promptfoo warns that the credential has expired. Requests made this way are expected to count against your Claude subscription the same way calls from the Claude Code CLI do — check Anthropic's documentation for current billing behavior.

This also enables model-graded assertions such as llm-rubric to run without a separate Anthropic Console key — see the example below.

Models

The anthropic provider supports the following models via the messages API:

Model ID	Description
`anthropic:messages:claude-opus-4-7`	Claude 4.7 Opus
`anthropic:messages:claude-sonnet-4-6`	Claude 4.6 Sonnet
`anthropic:messages:claude-opus-4-6`	Claude 4.6 Opus
`anthropic:messages:claude-opus-4-5-20251101` (claude-opus-4-5-latest)	Claude 4.5 Opus
`anthropic:messages:claude-opus-4-1-20250805` (claude-opus-4-1-latest)	Claude 4.1 Opus
`anthropic:messages:claude-opus-4-20250514` (claude-opus-4-latest)	Claude 4 Opus
`anthropic:messages:claude-sonnet-4-5-20250929` (claude-sonnet-4-5-latest)	Claude 4.5 Sonnet
`anthropic:messages:claude-sonnet-4-20250514` (claude-sonnet-4-latest)	Claude 4 Sonnet
`anthropic:messages:claude-haiku-4-5-20251001` (claude-haiku-4-5-latest)	Claude 4.5 Haiku
`anthropic:messages:claude-3-7-sonnet-20250219` (claude-3-7-sonnet-latest)	Claude 3.7 Sonnet
`anthropic:messages:claude-3-5-sonnet-20241022` (claude-3-5-sonnet-latest)	Claude 3.5 Sonnet (v2)
`anthropic:messages:claude-3-5-sonnet-20240620`	Claude 3.5 Sonnet (v1)
`anthropic:messages:claude-3-5-haiku-20241022` (claude-3-5-haiku-latest)	Claude 3.5 Haiku
`anthropic:messages:claude-3-opus-20240229` (claude-3-opus-latest)	Claude 3 Opus
`anthropic:messages:claude-3-haiku-20240307`	Claude 3 Haiku

Cross-Platform Model Availability

Claude models are available across multiple platforms. Here's how the model names map across different providers:

Model	Anthropic API	Azure AI Foundry (docs)	AWS Bedrock (docs)	GCP Vertex AI (docs)
Claude 4.7 Opus	claude-opus-4-7	claude-opus-4-7	anthropic.claude-opus-4-7	claude-opus-4-7
Claude 4.6 Sonnet	claude-sonnet-4-6	claude-sonnet-4-6	anthropic.claude-sonnet-4-6	claude-sonnet-4-6
Claude 4.6 Opus	claude-opus-4-6	claude-opus-4-6-20260205	anthropic.claude-opus-4-6-v1	claude-opus-4-6
Claude 4.5 Opus	claude-opus-4-5-20251101 (claude-opus-4-5-latest)	claude-opus-4-5-20251101	anthropic.claude-opus-4-5-20251101-v1:0	claude-opus-4-5@20251101
Claude 4.5 Sonnet	claude-sonnet-4-5-20250929 (claude-sonnet-4-5-latest)	claude-sonnet-4-5-20250929	anthropic.claude-sonnet-4-5-20250929-v1:0	claude-sonnet-4-5@20250929
Claude 4.5 Haiku	claude-haiku-4-5-20251001 (claude-haiku-4-5-latest)	claude-haiku-4-5-20251001	anthropic.claude-haiku-4-5-20251001-v1:0	claude-haiku-4-5@20251001
Claude 4.1 Opus	claude-opus-4-1-20250805	claude-opus-4-1-20250805	anthropic.claude-opus-4-1-20250805-v1:0	claude-opus-4-1@20250805
Claude 4 Opus	claude-opus-4-20250514 (claude-opus-4-latest)	claude-opus-4-20250514	anthropic.claude-opus-4-20250514-v1:0	claude-opus-4@20250514
Claude 4 Sonnet	claude-sonnet-4-20250514 (claude-sonnet-4-latest)	claude-sonnet-4-20250514	anthropic.claude-sonnet-4-20250514-v1:0	claude-sonnet-4@20250514
Claude 3.7 Sonnet	claude-3-7-sonnet-20250219 (claude-3-7-sonnet-latest)	claude-3-7-sonnet-20250219	anthropic.claude-3-7-sonnet-20250219-v1:0	claude-3-7-sonnet@20250219
Claude 3.5 Sonnet	claude-3-5-sonnet-20241022 (claude-3-5-sonnet-latest)	claude-3-5-sonnet-20241022	anthropic.claude-3-5-sonnet-20241022-v2:0	claude-3-5-sonnet-v2@20241022
Claude 3.5 Haiku	claude-3-5-haiku-20241022 (claude-3-5-haiku-latest)	claude-3-5-haiku-20241022	anthropic.claude-3-5-haiku-20241022-v1:0	claude-3-5-haiku@20241022
Claude 3 Opus	claude-3-opus-20240229 (claude-3-opus-latest)	claude-3-opus-20240229	anthropic.claude-3-opus-20240229-v1:0	claude-3-opus@20240229
Claude 3 Haiku	claude-3-haiku-20240307	claude-3-haiku-20240307	anthropic.claude-3-haiku-20240307-v1:0	claude-3-haiku@20240307

Supported Parameters

Config Property	Environment Variable	Description
apiKey	ANTHROPIC_API_KEY	Your API key from Anthropic
apiKeyRequired	-	Skip the API key preflight and authenticate via a local Claude Code session
apiBaseUrl	ANTHROPIC_BASE_URL	The base URL for requests to the Anthropic API
temperature	ANTHROPIC_TEMPERATURE	Controls the randomness of the output (default: 0). Omitted when `top_p` is set.
max_tokens	ANTHROPIC_MAX_TOKENS	The maximum length of the generated text (default: 1024)
cost	-	Legacy per-token override applied to both input and output pricing
inputCost	-	Override input token pricing in promptfoo cost estimates
outputCost	-	Override output token pricing in promptfoo cost estimates
top_p	-	Controls nucleus sampling. Mutually exclusive with `temperature`.
top_k	-	Only sample from the top K options for each subsequent token
stop_sequences	-	Array of strings that will stop generation when encountered
stream	-	Enable streaming (required when `max_tokens` > 21,333)
tools	-	An array of tool or function definitions for the model to call
tool_choice	-	An object specifying the tool to call
effort	-	Output effort level: `low`, `medium`, `high`, `xhigh`, or `max`
output_format	-	JSON schema configuration for structured outputs
thinking	-	Configuration for Claude's extended thinking (`enabled`, `adaptive`, or `disabled`)
showThinking	-	Whether to include thinking content in the output (default: true)
cache_control	-	Auto-apply cache_control to the last cacheable block in the request
metadata	-	Request metadata such as `user_id` for tracking purposes
service_tier	-	Priority tier: `auto` (default) or `standard_only`
headers	-	Additional headers to be sent with the API request
extra_body	-	Additional parameters to be included in the API request body

Prompt Template

To allow for compatibility with the OpenAI prompt template, the following format is supported:

prompt.json
[
  {
    "role": "system",
    "content": "{{ system_message }}"
  },
  {
    "role": "user",
    "content": "{{ question }}"
  }
]

If the role system is specified, it will be automatically added to the API request. All user or assistant roles will be automatically converted into the right format for the API request. Currently, only type text is supported.

The system_message and question are example variables that can be set with the var directive.

Options

The Anthropic provider supports several options to customize the behavior of the model. These include:

temperature: Controls the randomness of the output.
max_tokens: The maximum length of the generated text.
top_p: Controls nucleus sampling, affecting the randomness of the output.
top_k: Only sample from the top K options for each subsequent token.
tools: An array of tool or function definitions for the model to call.
tool_choice: An object specifying the tool to call.
stop_sequences: An array of strings that stop generation when encountered.
metadata: Request metadata (e.g., user_id) passed to the API.
extra_body: Additional parameters to pass directly to the Anthropic API request body.

Example configuration with options and prompts:

promptfooconfig.yaml
providers:
  - id: anthropic:messages:claude-sonnet-4-5-20250929
    config:
      temperature: 0.0
      max_tokens: 512
      extra_body:
        custom_param: 'test_value'
prompts:
  - file://prompt.json

Stop Sequences

Use stop_sequences to halt generation when Claude encounters specific strings:

providers:
  - id: anthropic:messages:claude-sonnet-4-5-20250929
    config:
      stop_sequences:
        - "\n\nHuman:"
        - 'STOP'

Metadata

Pass request metadata to the API for tracking or auditing purposes:

providers:
  - id: anthropic:messages:claude-sonnet-4-5-20250929
    config:
      metadata:
        user_id: 'user-123'

Tool Calling

The Anthropic provider supports tool calling (function calling). Here's an example configuration for defining tools.

promptfooconfig.yaml
providers:
  - id: anthropic:messages:claude-sonnet-4-5-20250929
    config:
      tools:
        - name: get_weather
          description: Get the current weather in a given location
          input_schema:
            type: object
            properties:
              location:
                type: string
                description: The city and state, e.g., San Francisco, CA
              unit:
                type: string
                enum:
                  - celsius
                  - fahrenheit
            required:
              - location

Web Search and Web Fetch Tools

Anthropic provides specialized tools for web search and web fetching capabilities:

Web Fetch Tool

The web fetch tool allows Claude to retrieve full content from web pages and PDF documents. This is useful when you want Claude to access and analyze specific web content.

providers:
  - id: anthropic:messages:claude-sonnet-4-5-20250929
    config:
      tools:
        - type: web_fetch_20250910
          name: web_fetch
          max_uses: 5
          allowed_domains:
            - docs.example.com
            - help.example.com
          citations:
            enabled: true
          max_content_tokens: 50000

Promptfoo also supports the stable web_fetch_20260209 variant. A newer version web_fetch_20260309 adds use_cache support for controlling whether cached content is used:

providers:
  - id: anthropic:messages:claude-sonnet-4-5-20250929
    config:
      tools:
        - type: web_fetch_20260209
          name: web_fetch
          max_uses: 3
          defer_loading: true
        - type: web_fetch_20260309
          name: web_fetch
          max_uses: 3
          use_cache: false # Bypass cache for fresh content

Web Fetch Tool Configuration Options:

Parameter	Type	Description
`type`	string	`web_fetch_20250910` (beta), `web_fetch_20260209`, or `web_fetch_20260309` (adds `use_cache`)
`name`	string	Must be `web_fetch`
`max_uses`	number	Maximum number of web fetches per request (optional)
`allowed_callers`	string[]	Restrict which tool callers may invoke the server tool (optional)
`allowed_domains`	string[]	List of domains to allow fetching from (optional, mutually exclusive with `blocked_domains`)
`blocked_domains`	string[]	List of domains to block fetching from (optional, mutually exclusive with `allowed_domains`)
`defer_loading`	boolean	Load the tool lazily instead of including it in the initial system prompt (optional)
`citations`	object	Enable citations with `{ enabled: true }` (optional)
`max_content_tokens`	number	Maximum tokens for web content (optional)
`cache_control`	object	Apply Anthropic cache control to the tool definition (optional)
`strict`	boolean	Enable strict schema validation for tool names and inputs (optional)
`use_cache`	boolean	Whether to use cached content (`web_fetch_20260309` only, optional)

Web Search Tool

The web search tool allows Claude to search the internet for information:

providers:
  - id: anthropic:messages:claude-sonnet-4-5-20250929
    config:
      tools:
        - type: web_search_20260209
          name: web_search
          max_uses: 3

Web Search Tool Configuration Options:

Parameter	Type	Description
`type`	string	`web_search_20250305` (beta) or `web_search_20260209`
`name`	string	Must be `web_search`
`max_uses`	number	Maximum number of searches per request (optional)
`allowed_callers`	string[]	Restrict which tool callers may invoke the server tool (optional)
`allowed_domains`	string[]	Restrict results to specific domains (optional, mutually exclusive with `blocked_domains`)
`blocked_domains`	string[]	Exclude domains from results (optional, mutually exclusive with `allowed_domains`)
`cache_control`	object	Apply Anthropic cache control to the tool definition (optional)
`defer_loading`	boolean	Load the tool lazily instead of including it in the initial system prompt (optional)
`strict`	boolean	Enable strict schema validation for tool names and inputs (optional)
`user_location`	object	Approximate user location to improve search relevance (optional)

Combined Web Search and Web Fetch

You can use both tools together for comprehensive web information gathering:

providers:
  - id: anthropic:messages:claude-sonnet-4-5-20250929
    config:
      tools:
        - type: web_search_20260209
          name: web_search
          max_uses: 3
        - type: web_fetch_20260309
          name: web_fetch
          max_uses: 5
          citations:
            enabled: true

This configuration allows the model to first search for relevant information, then fetch full content from the most promising results.

Memory Tool

Anthropic's memory_20250818 tool can be included in tools. Promptfoo passes this native tool definition through unchanged, which is useful for evaluating whether a model requests memory operations. Promptfoo does not manage Anthropic memory stores or run local memory handlers for you.

providers:
  - id: anthropic:messages:claude-sonnet-4-6
    config:
      tools:
        - type: memory_20250818
          name: memory
          allowed_callers:
            - direct

Memory Tool Configuration Options:

Parameter	Type	Description
`type`	string	Must be `memory_20250818`
`name`	string	Must be `memory`
`allowed_callers`	string[]	Restrict which tool callers may invoke the memory tool (optional)
`cache_control`	object	Apply Anthropic cache control to the tool definition (optional)
`defer_loading`	boolean	Load the tool lazily instead of including it in the initial prompt
`input_examples`	object[]	Example memory commands to include in the tool definition (optional)
`strict`	boolean	Enable strict schema validation for tool names and inputs (optional)

Important Security Notes:

The web fetch tool requires trusted environments due to potential data exfiltration risks
The model cannot dynamically construct URLs - only URLs provided by users or from search results can be fetched
Use domain filtering to restrict access to specific sites:
- Use allowed_domains to whitelist trusted domains (recommended)
- Use blocked_domains to blacklist specific domains
- Note: Only one of allowed_domains or blocked_domains can be specified, not both

See the Anthropic Tool Use Guide for more information on how to define tools and the tool use example here.

Images / Vision

You can include images in the prompts in Claude 3 models.

See the Claude vision example.

One important note: The Claude API only supports base64 representations of images. This is different from how OpenAI's vision works, as it supports grabbing images from a URL. As a result, if you are trying to compare Claude 3 and OpenAI vision capabilities, you will need to have separate prompts for each.

See the OpenAI vision example to understand the differences.

Prompt Caching

Claude supports prompt caching to optimize API usage and reduce costs for repetitive tasks. This feature caches portions of your prompts to avoid reprocessing identical content in subsequent requests.

Supported on all Claude 3, 3.5, and 4 models. Basic example:

promptfooconfig.yaml
providers:
  - id: anthropic:messages:claude-sonnet-4-5-20250929
prompts:
  - file://prompts.yaml

prompts.yaml
- role: system
  content:
    - type: text
      text: 'System message'
      cache_control:
        type: ephemeral
    - type: text
      text: '{{context}}'
      cache_control:
        type: ephemeral
- role: user
  content: '{{question}}'

As a simpler alternative, use the top-level cache_control parameter to automatically apply a cache marker to the last cacheable block in the request, without annotating each block individually:

providers:
  - id: anthropic:messages:claude-sonnet-4-5-20250929
    config:
      cache_control:
        type: ephemeral

Common use cases for caching:

System messages and instructions
Tool/function definitions
Large context documents
Frequently used images

Cache read and creation token counts are tracked in the response's token usage details.

See Anthropic's Prompt Caching Guide for more details on requirements, pricing, and best practices.

Citations

Claude can provide detailed citations when answering questions about documents. Basic example:

promptfooconfig.yaml
providers:
  - id: anthropic:messages:claude-sonnet-4-5-20250929
prompts:
  - file://prompts.yaml

prompts.yaml
- role: user
  content:
    - type: document
      source:
        type: text
        media_type: text/plain
        data: 'Your document text here'
      citations:
        enabled: true
    - type: text
      text: 'Your question here'

See Anthropic's Citations Guide for more details.

PDF Documents

Claude can process PDF files using document content blocks. Pass the PDF as base64-encoded data:

- role: user
  content:
    - type: document
      source:
        type: base64
        media_type: application/pdf
        data: '{{pdf_base64}}'
    - type: text
      text: 'Summarize this document'

Use a test var to supply the base64-encoded PDF content:

tests:
  - vars:
      pdf_base64: file://document.pdf

Claude Opus 4.7 notes

Opus 4.7 is designed around adaptive thinking and runs with the reasoning stack always on. Promptfoo handles the key differences from earlier Opus models automatically:

Temperature is managed for you. Opus 4.7 samples adaptively and does not accept temperature; promptfoo omits the field from every request. Passing temperature in config or ANTHROPIC_TEMPERATURE logs a one-time heads-up so you can clean the value out of your eval.
Adaptive thinking is the default. Use thinking: { type: 'adaptive' } (or leave thinking unset) to let the model choose how much to reason per request. Budget-based modes from older models aren't used on 4.7.
xhigh effort level is available. It sits between high and max and is a good starting point for coding and agentic tasks. See the Effort Level section.
Updated tokenizer. The same input can map to 1.0–1.35× more tokens than Opus 4.6, so measure real traffic if you're comparing costs.

The same guidance applies when you reach Opus 4.7 through AWS Bedrock, GCP Vertex, or Azure AI Foundry — promptfoo suppresses temperature on each of those paths as well.

Extended Thinking

Claude supports an extended thinking capability that allows you to see the model's internal reasoning process before it provides the final answer. This can be configured using the thinking parameter:

promptfooconfig.yaml
providers:
  # Adaptive thinking (recommended for Claude Opus 4.7)
  - id: anthropic:messages:claude-opus-4-7
    config:
      max_tokens: 20000
      thinking:
        type: 'adaptive'

  # Enabled thinking with explicit budget
  - id: anthropic:messages:claude-sonnet-4-5-20250929
    config:
      max_tokens: 20000
      thinking:
        type: 'enabled'
        budget_tokens: 16000 # Must be ≥1024 and less than max_tokens

The thinking configuration has three possible values:

Adaptive thinking (recommended for Claude Opus 4.7):

thinking:
  type: 'adaptive'

In adaptive mode, Claude decides when and how much to think based on the complexity of the request. This is the recommended mode for claude-opus-4-7.

Enabled thinking:

thinking:
  type: 'enabled'
  budget_tokens: number # Must be ≥1024 and less than max_tokens

Disabled thinking:

thinking:
  type: 'disabled'

The display field controls how thinking content is returned:

'summarized' (default) - thinking content is included in the response
'omitted' - thinking content is redacted but a signature is returned for multi-turn continuity (saves tokens)

thinking:
  type: enabled
  budget_tokens: 10000
  display: omitted

When thinking is enabled or adaptive:

Responses will include thinking content blocks showing Claude's reasoning process
Requires a minimum budget of 1,024 tokens
The budget_tokens value must be less than the max_tokens parameter
The tokens used for thinking count towards your max_tokens limit
A specialized 28 or 29 token system prompt is automatically included
Previous turn thinking blocks are ignored and not counted as input tokens
temperature and top_k are incompatible with thinking and will be omitted with a warning
top_p is clamped to the range [0.95, 1.0] when thinking is enabled
Forced tool use (tool_choice type any or tool) is incompatible with thinking and will be omitted with a warning; use auto instead

Example response with thinking enabled:

{
  "content": [
    {
      "type": "thinking",
      "thinking": "Let me analyze this step by step...",
      "signature": "WaUjzkypQ2mUEVM36O2TxuC06KN8xyfbJwyem2dw3URve/op91XWHOEBLLqIOMfFG/UvLEczmEsUjavL...."
    },
    {
      "type": "text",
      "text": "Based on my analysis, here is the answer..."
    }
  ]
}

Controlling Thinking Output

By default, thinking content is included in the response output. You can control this behavior using the showThinking parameter:

promptfooconfig.yaml
providers:
  - id: anthropic:messages:claude-sonnet-4-5-20250929
    config:
      thinking:
        type: 'enabled'
        budget_tokens: 16000
      showThinking: false # Exclude thinking content from the output

When showThinking is set to false, the thinking content will be excluded from the output, and only the final response will be returned. This is useful when you want to use thinking for better reasoning but don't want to expose the thinking process to end users.

Redacted Thinking

Sometimes Claude's internal reasoning may be flagged by safety systems. When this occurs, the thinking block will be encrypted and returned as a redacted_thinking block:

{
  "content": [
    {
      "type": "redacted_thinking",
      "data": "EmwKAhgBEgy3va3pzix/LafPsn4aDFIT2Xlxh0L5L8rLVyIwxtE3rAFBa8cr3qpP..."
    },
    {
      "type": "text",
      "text": "Based on my analysis..."
    }
  ]
}

Redacted thinking blocks are automatically decrypted when passed back to the API, allowing Claude to maintain context without compromising safety guardrails.

Extended Output with Thinking

Claude 4 models provide enhanced output capabilities and extended thinking support:

providers:
  - id: anthropic:messages:claude-sonnet-4-5-20250929
    config:
      max_tokens: 64000 # Claude 4 Sonnet supports up to 64K output tokens
      thinking:
        type: 'enabled'
        budget_tokens: 32000

Note: The output-128k-2025-02-19 beta feature is specific to Claude 3.7 Sonnet and is not needed for Claude 4 models, which have improved output capabilities built-in.

When using extended output:

Streaming is required when max_tokens is greater than 21,333
For thinking budgets above 32K, batch processing is recommended
The model may not use the entire allocated thinking budget

See Anthropic's Extended Thinking Guide for more details on requirements and best practices.

Effort Level

The effort parameter controls the output quality/speed tradeoff. Higher effort levels may produce more thorough responses but take longer:

providers:
  - id: anthropic:messages:claude-opus-4-7
    config:
      effort: xhigh # Options: low, medium, high, xhigh, max

Claude Opus 4.7 introduces the xhigh level between high and max, giving finer control over reasoning/latency on hard problems. For coding and agentic use cases, Anthropic recommends starting with high or xhigh.

This can be combined with other features like structured outputs:

providers:
  - id: anthropic:messages:claude-opus-4-7
    config:
      effort: high
      output_format:
        type: json_schema
        schema:
          type: object
          properties:
            analysis:
              type: string
          required:
            - analysis
          additionalProperties: false

Structured Outputs

Structured outputs constrain Claude's responses to a JSON schema. Supported on Claude Opus 4.7, Opus 4.6, Sonnet 4.6, and Sonnet 4.5+ / Opus 4.1+.

JSON Outputs

Add output_format to get structured responses:

providers:
  - id: anthropic:messages:claude-sonnet-4-5-20250929
    config:
      output_format:
        type: json_schema
        schema:
          type: object
          properties:
            name:
              type: string
            email:
              type: string
          required:
            - name
            - email
          additionalProperties: false

You can also load the entire output_format from an external file:

config:
  output_format: file://./schemas/analysis-format.json

Nested file references are supported for the schema:

analysis-format.json
{
  "type": "json_schema",
  "schema": "file://./schemas/analysis-schema.json"
}

Variable rendering is supported in file paths:

config:
  output_format: file://./schemas/{{ schema_name }}.json

Strict Tool Use

Add strict: true to tool definitions for schema-validated parameters:

providers:
  - id: anthropic:messages:claude-sonnet-4-5-20250929
    config:
      tools:
        - name: get_weather
          strict: true
          input_schema:
            type: object
            properties:
              location:
                type: string
            required:
              - location
            additionalProperties: false

Limitations

Supported: object, array, string, integer, number, boolean, null, enum, required, additionalProperties: false

Not supported: recursive schemas, minimum/maximum, minLength/maxLength

Incompatible with: citations, message prefilling

See Anthropic's guide and the structured outputs example.

Model-Graded Tests

Model-graded assertions such as factuality or llm-rubric will automatically use Anthropic as the grading provider if ANTHROPIC_API_KEY is set and OPENAI_API_KEY is not set.

If both API keys are present, OpenAI will be used by default. You can explicitly override the grading provider in your configuration.

Claude Pro/Max subscribers without a separate Anthropic Console key can wire up llm-rubric through a local Claude Code session by pointing the grader at anthropic:messages:<model> with apiKeyRequired: false:

defaultTest:
  options:
    provider:
      id: anthropic:messages:claude-sonnet-4-6
      config:
        apiKeyRequired: false

See Authenticating via a Claude Code session above for how the credential is loaded and what beta headers Promptfoo sets.

Because of how model-graded evals are implemented, the model must support chat-formatted prompts (except for embedding or classification models).

You can override the grading provider in several ways:

For all test cases using defaultTest:

promptfooconfig.yaml
defaultTest:
  options:
    provider: anthropic:messages:claude-sonnet-4-5-20250929

For individual assertions:

assert:
  - type: llm-rubric
    value: Do not mention that you are an AI or chat assistant
    provider:
      id: anthropic:messages:claude-sonnet-4-5-20250929
      config:
        temperature: 0.0

For specific tests:

tests:
  - vars:
      question: What is the capital of France?
    options:
      provider:
        id: anthropic:messages:claude-sonnet-4-5-20250929
    assert:
      - type: llm-rubric
        value: Answer should mention Paris

Additional Capabilities

Caching: Promptfoo caches previous LLM requests by default.
Token Usage Tracking: Provides detailed information on the number of tokens used in each request, aiding in usage monitoring and optimization.
Cost Calculation: Calculates the cost of each request based on the number of tokens generated and the specific model used.

Anthropic

Setup

Authenticating via a Claude Code session

Models

Cross-Platform Model Availability

Supported Parameters

Prompt Template

Options

Stop Sequences

Metadata

Tool Calling

Web Search and Web Fetch Tools

Web Fetch Tool

Web Search Tool

Combined Web Search and Web Fetch

Memory Tool

Images / Vision

Prompt Caching

Citations

PDF Documents

Claude Opus 4.7 notes

Extended Thinking

Controlling Thinking Output

Redacted Thinking

Extended Output with Thinking

Effort Level

Structured Outputs

JSON Outputs

Strict Tool Use

Limitations

Model-Graded Tests

Additional Capabilities

See Also

Examples

Core Features

Model Comparisons & Evaluations

Cloud Platform Integrations

Agentic Evaluations

Setup​

Authenticating via a Claude Code session​

Models​

Cross-Platform Model Availability​

Supported Parameters​

Prompt Template​

Options​

Stop Sequences​

Metadata​

Tool Calling​

Web Search and Web Fetch Tools​

Web Fetch Tool​

Web Search Tool​

Combined Web Search and Web Fetch​

Memory Tool​

Images / Vision​

Prompt Caching​

Citations​

PDF Documents​

Claude Opus 4.7 notes​

Extended Thinking​

Controlling Thinking Output​

Redacted Thinking​

Extended Output with Thinking​

Effort Level​

Structured Outputs​

JSON Outputs​

Strict Tool Use​

Limitations​

Model-Graded Tests​

Additional Capabilities​

See Also​

Examples​

Core Features​

Model Comparisons & Evaluations​

Cloud Platform Integrations​

Agentic Evaluations​

Setup

Authenticating via a Claude Code session

Models

Cross-Platform Model Availability

Supported Parameters

Prompt Template

Options

Stop Sequences

Metadata

Tool Calling

Web Search and Web Fetch Tools

Web Fetch Tool

Web Search Tool

Combined Web Search and Web Fetch

Memory Tool

Images / Vision

Prompt Caching

Citations

PDF Documents

Claude Opus 4.7 notes

Extended Thinking

Controlling Thinking Output

Redacted Thinking

Extended Output with Thinking

Effort Level

Structured Outputs

JSON Outputs

Strict Tool Use

Limitations

Model-Graded Tests

Additional Capabilities

See Also

Examples

Core Features

Model Comparisons & Evaluations

Cloud Platform Integrations

Agentic Evaluations