WatsonX
IBM WatsonX offers a range of enterprise-grade foundation models optimized for various business use cases. This provider supports text generation and chat models from the Granite and Llama series, along with additional models for code generation and multilingual tasks.
Supported Models
IBM watsonx.ai provides foundation models through its inference API. The promptfoo WatsonX provider currently supports text generation and chat models that can be called directly via API.
To see the latest models available in your region, use IBM's API or review IBM's supported foundation models:
curl "https://us-south.ml.cloud.ibm.com/ml/v1/foundation_model_specs?version=2024-05-01" \
-H "Authorization: Bearer YOUR_TOKEN"
Currently Available Models
The following are representative ready-to-use models that IBM currently provides for direct inferencing through the text generation or chat APIs:
IBM Granite
ibm/granite-4-h-small- Latest ready-to-use Granite text modelibm/granite-3-8b-instruct- Older instruct model (deprecated)ibm/granite-8b-code-instruct- Code generation specialist
Meta Llama
meta-llama/llama-4-maverick-17b-128e-instruct-fp8- Latest Llama 4 modelmeta-llama/llama-3-3-70b-instruct- Latest Llama 3.3 (70B)
Mistral
mistralai/mistral-large-2512- Latest ready-to-use Mistral Large modelmistralai/mistral-medium-2505- Mid-tier modelmistralai/mistral-small-3-1-24b-instruct-2503- Smaller instruct model
Other Models
openai/gpt-oss-120b- Open-source GPT-compatible modelsdaia/allam-1-13b-instruct- Arabic and English instruct model
Other Model Types
IBM watsonx.ai also offers:
- Deploy on Demand Models - Curated models that require creating a dedicated deployment first
- Embedding Models - For generating text embeddings (e.g.,
ibm/granite-embedding-278m-multilingual) - Reranker Models - For improving search results (e.g.,
cross-encoder/ms-marco-minilm-l-12-v2) - Vision and Guardrail Models - Models with APIs or payloads that differ from the provider's current text/chat workflow
The promptfoo WatsonX provider focuses on text generation and chat models only. Deploy on Demand, embedding, and reranker models use different API endpoints and workflows. For these model types, use IBM's API directly or create a custom provider.
- Region-specific: Model availability varies by IBM Cloud region
- Version changes: IBM regularly updates available models
- Deprecation: Models marked "deprecated" will be removed in future releases
Always verify current availability using IBM's API or check your watsonx.ai project's model catalog.
Prerequisites
Before integrating the WatsonX provider, ensure you have the following:
-
IBM Cloud Account: You will need an IBM Cloud account to obtain API access to WatsonX models.
-
API Key or Bearer Token, and Project ID:
- API Key: You can retrieve your API key by logging in to your IBM Cloud Account and navigating to the "API Keys" section.
- Bearer Token: To obtain a bearer token, follow this guide.
- Project ID: To find your Project ID, log in to IBM WatsonX Prompt Lab, select your project, and locate the project ID in the provided
curlcommand.
Make sure you have either the API key or bearer token, along with the project ID, before proceeding.
Installation
To install the WatsonX provider, use the following steps:
-
Install the necessary dependencies:
npm install @ibm-cloud/watsonx-ai ibm-cloud-sdk-core -
Set up the necessary environment variables:
You can choose between two authentication methods:
Option 1: IAM Authentication (Recommended)
export WATSONX_AI_APIKEY=your-ibm-cloud-api-keyexport WATSONX_AI_PROJECT_ID=your-project-idOption 2: Bearer Token Authentication
export WATSONX_AI_BEARER_TOKEN=your-bearer-tokenexport WATSONX_AI_PROJECT_ID=your-project-idForce Specific Auth Method (Optional)
export WATSONX_AI_AUTH_TYPE=iam # or 'bearertoken'Authentication PriorityIf
WATSONX_AI_AUTH_TYPEis not set, the provider will automatically use:- IAM authentication if
WATSONX_AI_APIKEYis available - Bearer token authentication if
WATSONX_AI_BEARER_TOKENis available
- IAM authentication if
-
Alternatively, you can configure the authentication and project ID directly in the configuration file:
providers:- id: watsonx:ibm/granite-4-h-smallconfig:# Option 1: IAM AuthenticationapiKey: your-ibm-cloud-api-key# Option 2: Bearer Token Authentication# apiBearerToken: your-ibm-cloud-bearer-tokenprojectId: your-ibm-project-idserviceUrl: https://us-south.ml.cloud.ibm.com
Usage Examples
Once configured, you can use the WatsonX provider to generate text responses based on prompts. Here's an example using the Granite 4 H Small model:
providers:
- watsonx:ibm/granite-4-h-small
prompts:
- "Answer the following question: '{{question}}'"
tests:
- vars:
question: 'What is the capital of France?'
assert:
- type: contains
value: 'Paris'
You can also use other models by changing the model ID:
providers:
# IBM Granite models
- watsonx:ibm/granite-4-h-small
- watsonx:ibm/granite-8b-code-instruct
# Meta Llama models
- watsonx:meta-llama/llama-3-3-70b-instruct
- watsonx:meta-llama/llama-4-maverick-17b-128e-instruct-fp8
# Mistral models
- watsonx:mistralai/mistral-large-2512
- watsonx:mistralai/mistral-medium-2505
Configuration Options
Text Generation Parameters
The WatsonX provider supports the full range of text generation parameters from the IBM SDK:
| Parameter | Type | Description |
|---|---|---|
maxNewTokens | number | Maximum tokens to generate (default: 100) |
minNewTokens | number | Minimum tokens before stop sequences apply |
temperature | number | Sampling temperature (0-2) |
topP | number | Nucleus sampling parameter (0-1) |
topK | number | Top-k sampling parameter |
decodingMethod | string | 'greedy' or 'sample' |
stopSequences | string[] | Sequences that cause generation to stop |
repetitionPenalty | number | Penalty for repeated tokens |
randomSeed | number | Seed for reproducible outputs |
timeLimit | number | Time limit in milliseconds |
truncateInputTokens | number | Max input tokens before truncation |
includeStopSequence | boolean | Include stop sequence in output |
lengthPenalty | object | Length penalty configuration |
Example with Parameters
providers:
- id: watsonx:ibm/granite-4-h-small
config:
temperature: 0.7
topP: 0.9
topK: 50
maxNewTokens: 1024
stopSequences: ['END', 'STOP']
repetitionPenalty: 1.1
decodingMethod: sample
Length Penalty
For more control over output length:
providers:
- id: watsonx:ibm/granite-4-h-small
config:
lengthPenalty:
decayFactor: 1.5
startIndex: 10
Chat Mode
WatsonX also supports chat-style interactions using the textChat API. Use the watsonx:chat: prefix:
providers:
- id: watsonx:chat:ibm/granite-4-h-small
config:
temperature: 0.7
maxNewTokens: 1024
Chat mode automatically parses messages in JSON format:
prompts:
- |
[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "{{question}}"}
]
providers:
- watsonx:chat:ibm/granite-4-h-small
For plain text prompts, the chat provider automatically wraps them as a user message.
Chat vs Text Generation
| Feature | Text Generation (watsonx:) | Chat (watsonx:chat:) |
|---|---|---|
| API Method | generateText | textChat |
| Input Format | Plain text | Messages array or plain text |
| Best For | Completion tasks | Conversational applications |
| System Messages | Not supported | Supported |
Environment Variables
| Variable | Description |
|---|---|
WATSONX_AI_APIKEY | IBM Cloud API key for IAM authentication |
WATSONX_AI_BEARER_TOKEN | Bearer token for token-based authentication |
WATSONX_AI_PROJECT_ID | WatsonX project ID |
WATSONX_AI_AUTH_TYPE | Force auth type: iam or bearertoken |
Migrating from IBM BAM
The IBM BAM provider has been deprecated (sunset March 2025). To migrate:
- Change provider prefix from
bam:towatsonx: - Update authentication to use WatsonX credentials
- Update model IDs to WatsonX equivalents (e.g.,
ibm/granite-4-h-small)