Context Manipulation Attacks: Why Web Agents Need Secure Memory

Published on

July 14, 2025

Read time:

6 mins

Tl;dr:

We show that stateless web agents rely on external memory systems that can be corrupted, creating a critical new attack surface
In our previous work on Web3 agents, we introduced context manipulation and CrAIBench, demonstrating cross‐platform memory injection on ElizaOS (Figure 1)
Here we formalize “plan injection,” in which adversaries insert malicious steps into an agent’s task plan, achieving up to 3× higher success than prompt attacks
We evaluate both the Plan Injection Benchmark and the WebVoyager-Privacy Benchmark to show how semantic alignment drives attack efficacy
Standard prompt defenses cut prompt injection rates but leave plan injection largely unaffected, with 46 percent success on Agent-E and 63 percent on Browser-use

‍

Figure 1. Information flow and context manipulation attack vectors for web agents.

‍

Why agent memory matters

Autonomous web agents translate natural‐language instructions into browser actions but are inherently stateless. To maintain context they depend on memory stored client-side or by third parties, outside the secure boundary of centralized chat systems. This creates a vulnerability: malicious actors can tamper with stored context rather than just prompts or retrieved data.

Our previous work in Web3

In “Real AI Agents with Fake Memories,” we revealed how context manipulation attacks on Web3 agents can lead to unauthorized crypto transfers. We introduced CrAIBench, a 150+ task benchmark for financial agents, and demonstrated cross‐platform memory injection on ElizaOS (Discord → Twitter), resulting in irreversible fund theft.

‍

*Figure 2. Plan injection attack to leak private data. Malicious steps (red) blend with legitimate subtasks.*

‍

Benchmarking plan injection and WebVoyager-Privacy

Plan Injection Benchmark (Agent-E)

15 samples × 5 runs across four categories: factual, opinion, advertisement, privacy
Opinion tasks: 94.7 percent success for task-aligned vs 70.7 percent for non-contextual
Factual tasks: 18.7 percent for task-aligned vs 0 percent for non-contextual

‍

*Figure 3. Attack success rates across task categories and injection strategies for Agent-E.*

*Figure 4. Embedding-space analysis: context-chained injections achieve optimal balance between user and attacker similarity.*

‍

WebVoyager-Privacy Benchmark

45 privacy tasks drawn from 9 domains of the WebVoyager dataset
Dynamic LLM-crafted injections ensure realistic scenarios

*Figure 5. ASR for weak/strong prompt injection and single plan injection on Agent-E under different defenses.*

‍

*Figure 6. Comparison of injection sophistication levels on Agent-E and Browser-use. Context-chained injections clearly outperform other methods.*

‍

Defenses fall short

We implemented two prompt defenses, explicit security guidelines and sandwiching retrieved content. Both reduced prompt injection ASR from > 80 percent to < 20 percent. However, a single plan injection at planning time still achieved 46 percent on Agent-E and 63 percent on Browser-use.