Skip to main content

6 posts tagged with "prompt-injection"

View All Tags

OpenClaw at Work: Prompt Injection Risks

Konstantine Kahadze
Developer Relations Engineer

OpenClaw combines web browsing, local file access, and outbound actions in one user-facing assistant. The capabilities that make OpenClaw valuable for work also increase the security risk.

In a controlled lab, we tested a local OpenClaw deployment with browser access, writable local state, and loopback SMS, email, and social sinks. A malicious webpage induced the agent to enumerate capabilities, read local documents, write local artifacts, and send unauthorized messages. Once an agent can browse untrusted content and act externally, the relevant security boundary is its action boundary, not the model itself.

We used Promptfoo's OpenClaw provider to evaluate a local agent, sent it to a malicious page, and observed capability enumeration, local artifact creation, and false incident messages.

Building a Security Scanner for LLM Apps

Dane Schneider
Staff Engineer

We're adding something new to Promptfoo's suite of AI security products: code scanning for LLM-related vulnerabilities. In this post, I will:

Prompt Injection vs Jailbreaking: What's the Difference?

Michael D'Angelo
Co-founder & CTO

Security teams often confuse two different AI attacks, leaving gaps attackers exploit. Prompt injection and jailbreaking target different parts of your system, but most organizations defend against them the same way—a mistake behind several 2025 breaches.

Recent vulnerabilities in development tools like Cursor IDE and GitHub Copilot show how misclassified attack vectors lead to inadequate defenses.

Harder, Better, Prompter, Stronger: AI system prompt hardening

Tabs Fakier
Contributor

This article assumes you're familiar with system prompts. If not, here are examples.

As AI technology advances, so must security practices. System prompts drive AI behavior, so naturally, we want these as robust as possible; enter system prompt hardening. Increased reuse of system prompts affects their prevalence, affecting the number of attacks to manipulate them.

No one would look at their AI's mangled output and decide 'Maybe the real treasure was the prompts we made along the way' with the problem being the opposite.

Jailbreaking LLMs: A Comprehensive Guide (With Examples)

Ian Webster
Co-founder & CEO

Let's face it - LLMs are gullible. With a few carefully chosen words, you can make even the most advanced AI models ignore their safety guardrails and do almost anything you ask.

As LLMs become increasingly integrated into apps, understanding these vulnerabilities is essential for developers and security professionals. This post examines common techniques that malicious actors use to compromise LLM systems, and more importantly, how to protect against them.

Prompt Injection: A Comprehensive Guide

Ian Webster
Co-founder & CEO

In August 2024, security researcher Johann Rehberger uncovered a critical vulnerability in Microsoft 365 Copilot: through a sophisticated prompt injection attack, he demonstrated how sensitive company data could be secretly exfiltrated.

This wasn't an isolated incident. From ChatGPT leaking information through hidden image links to Slack AI potentially exposing sensitive conversations, prompt injection attacks have emerged as a critical weak point in LLMs.

And although prompt injection has been a known issue for years, foundation labs still haven't quite been able to stamp it out, although mitigations are constantly being developed.