Skip to main content

Latest Posts

OpenClaw at Work: Prompt Injection Risks
Red Teaming

OpenClaw at Work: Prompt Injection Risks

In a controlled lab, a malicious webpage got OpenClaw to enumerate tools, read local documents, write artifacts, and send unauthorized messages to loopback sinks..

Konstantine KahadzeMar 12, 2026
McKinsey's Lilli Looks More Like an API Security Failure Than a Model Jailbreak
Security Vulnerability

McKinsey's Lilli Looks More Like an API Security Failure Than a Model Jailbreak

Public reporting points to exposed API surface, unsafe SQL construction, and broken object-level authorization.

Michael D'AngeloMar 10, 2026
Open-Sourcing ModelAudit: Security Scanner for ML Model Files
Company Update

Open-Sourcing ModelAudit: Security Scanner for ML Model Files

Promptfoo ModelAudit scans 42+ ML model formats for unsafe loading behaviors, known CVEs, and suspicious artifacts.

Yash ChhabriaMar 3, 2026
Indirect Prompt Injection in Web-Browsing Agents
Red Teaming

Indirect Prompt Injection in Web-Browsing Agents

Test if AI browsing agents follow malicious instructions or leak data with the indirect-web-pwn strategy..

Yash ChhabriaFeb 6, 2026
How AI Regulation Changed in 2025
AI Policy

How AI Regulation Changed in 2025

Why AI compliance questions multiplied in 2025.

Michael D'AngeloDec 15, 2025
Why Attack Success Rate (ASR) Isn't Comparable Across Jailbreak Papers Without a Shared Threat Model
Red Teaming

Why Attack Success Rate (ASR) Isn't Comparable Across Jailbreak Papers Without a Shared Threat Model

Attack Success Rate (ASR) is the most commonly reported metric for LLM red teaming, but it changes with attempt budget, prompt sets, and judge choice.

Michael D'AngeloDec 12, 2025
GPT-5.2 Initial Trust and Safety Assessment
Red Teaming

GPT-5.2 Initial Trust and Safety Assessment

Day-0 red team results for GPT-5.2.

Michael D'AngeloDec 11, 2025
Your model upgrade just broke your agent's safety
Security Vulnerability

Your model upgrade just broke your agent's safety

Model upgrades can change refusal, instruction-following, and tool-use behavior.

Guangshuo ZangDec 8, 2025
Real-Time Fact Checking for LLM Outputs
Feature Announcement

Real-Time Fact Checking for LLM Outputs

Promptfoo now supports web search in assertions, so you can verify time-sensitive information like stock prices, weather, and case citations during testing..

Michael D'AngeloNov 28, 2025
How to replicate the Claude Code attack with Promptfoo
AI Security

How to replicate the Claude Code attack with Promptfoo

A recent cyber espionage campaign revealed how state actors weaponized Anthropic's Claude Code - not through traditional hacking, but by convincing the AI itself to carry out malicious operations..

Ian WebsterNov 17, 2025
Will agents hack everything?
Security Vulnerability

Will agents hack everything?

The first state-level AI cyberattack raises hard questions: Can we stop AI agents from helping attackers? Should we?.

Dane SchneiderNov 14, 2025
When AI becomes the attacker: The rise of AI-orchestrated cyberattacks
Security Vulnerability

When AI becomes the attacker: The rise of AI-orchestrated cyberattacks

Google's November 2025 discovery of PROMPTFLUX and PROMPTSTEAL confirms Anthropic's August threat intelligence findings on AI-orchestrated attacks.

Michael D'AngeloNov 10, 2025
Reinforcement Learning with Verifiable Rewards Makes Models Faster, Not Smarter
Technical Guide

Reinforcement Learning with Verifiable Rewards Makes Models Faster, Not Smarter

RLVR trains reasoning models with programmatic verifiers instead of human labels.

Michael D'AngeloOct 24, 2025
Top 10 Open Datasets for LLM Safety, Toxicity & Bias Evaluation
AI Safety

Top 10 Open Datasets for LLM Safety, Toxicity & Bias Evaluation

A comprehensive guide to the most important open-source datasets for evaluating LLM safety, including toxicity detection, bias measurement, and truthfulness benchmarks..

Ian WebsterOct 6, 2025
Testing AI’s “Lethal Trifecta” with Promptfoo
Security Vulnerability

Testing AI’s “Lethal Trifecta” with Promptfoo

Learn what the lethal trifecta is and how to use promptfoo red teaming to detect prompt injection and data exfiltration risks in AI agents..

Ian WebsterSep 28, 2025
Autonomy and agency in AI: We should secure LLMs with the same fervor spent realizing AGI
AI Safety

Autonomy and agency in AI: We should secure LLMs with the same fervor spent realizing AGI

Exploring the critical need to secure LLMs with the same urgency and resources dedicated to achieving AGI, focusing on autonomy and agency in AI systems..

Tabs FakierSep 2, 2025
Prompt Injection vs Jailbreaking: What's the Difference?
Security Vulnerability

Prompt Injection vs Jailbreaking: What's the Difference?

Learn the critical difference between prompt injection and jailbreaking attacks, with real CVEs, production defenses, and test configurations..

Michael D'AngeloAug 18, 2025
AI Safety vs AI Security in LLM Applications: What Teams Must Know
AI Security

AI Safety vs AI Security in LLM Applications: What Teams Must Know

AI safety vs AI security for LLM apps.

Michael D'AngeloAug 17, 2025