Skip to main content
Featured

Your model upgrade just broke your agent's safety

Guangshuo Zang · 12/8/2025

Model upgrades can change refusal, instruction-following, and tool-use behavior. Here's how to prevent safety regressions in agentic apps..

Your model upgrade just broke your agent's safety

Latest Posts

Real-Time Fact Checking for LLM Outputs

Real-Time Fact Checking for LLM Outputs

Michael D'Angelo · 11/28/2025

Promptfoo now supports web search in assertions, so you can verify time-sensitive information like stock prices, weather, and case citations during testing..

How to replicate the Claude Code attack with Promptfoo

How to replicate the Claude Code attack with Promptfoo

Ian Webster · 11/17/2025

A recent cyber espionage campaign revealed how state actors weaponized Anthropic's Claude Code - not through traditional hacking, but by convincing the AI itself to carry out malicious operations..

Will agents hack everything?

Will agents hack everything?

Dane Schneider · 11/14/2025

The first state-level AI cyberattack raises hard questions: Can we stop AI agents from helping attackers? Should we?.

When AI becomes the attacker: The rise of AI-orchestrated cyberattacks

When AI becomes the attacker: The rise of AI-orchestrated cyberattacks

Michael D'Angelo · 11/10/2025

Google's November 2025 discovery of PROMPTFLUX and PROMPTSTEAL confirms Anthropic's August threat intelligence findings on AI-orchestrated attacks.

Reinforcement Learning with Verifiable Rewards Makes Models Faster, Not Smarter

Reinforcement Learning with Verifiable Rewards Makes Models Faster, Not Smarter

Michael D'Angelo · 10/24/2025

RLVR trains reasoning models with programmatic verifiers instead of human labels.

Top 10 Open Datasets for LLM Safety, Toxicity & Bias Evaluation

Top 10 Open Datasets for LLM Safety, Toxicity & Bias Evaluation

Ian Webster · 10/6/2025

A comprehensive guide to the most important open-source datasets for evaluating LLM safety, including toxicity detection, bias measurement, and truthfulness benchmarks..

Testing AI’s “Lethal Trifecta” with Promptfoo

Testing AI’s “Lethal Trifecta” with Promptfoo

Ian Webster · 9/28/2025

Learn what the lethal trifecta is and how to use promptfoo red teaming to detect prompt injection and data exfiltration risks in AI agents..

Autonomy and agency in AI: We should secure LLMs with the same fervor spent realizing AGI

Autonomy and agency in AI: We should secure LLMs with the same fervor spent realizing AGI

Tabs Fakier · 9/2/2025

Exploring the critical need to secure LLMs with the same urgency and resources dedicated to achieving AGI, focusing on autonomy and agency in AI systems..

Prompt Injection vs Jailbreaking: What's the Difference?

Prompt Injection vs Jailbreaking: What's the Difference?

Michael D'Angelo · 8/18/2025

Learn the critical difference between prompt injection and jailbreaking attacks, with real CVEs, production defenses, and test configurations..