- We show that stateless web agents rely on external memory systems that can be corrupted, creating a critical new attack surface
- In our previous work on Web3 agents, we introduced context manipulation and CrAIBench, demonstrating cross‐platform memory injection on ElizaOS (Figure 1)
- Here we formalize “plan injection,” in which adversaries insert malicious steps into an agent’s task plan, achieving up to 3× higher success than prompt attacks
- We evaluate both the Plan Injection Benchmark and the WebVoyager-Privacy Benchmark to show how semantic alignment drives attack efficacy
- Standard prompt defenses cut prompt injection rates but leave plan injection largely unaffected, with 46 percent success on Agent-E and 63 percent on Browser-use

Why agent memory matters
Autonomous web agents translate natural‐language instructions into browser actions but are inherently stateless. To maintain context they depend on memory stored client-side or by third parties, outside the secure boundary of centralized chat systems. This creates a vulnerability: malicious actors can tamper with stored context rather than just prompts or retrieved data.
Our previous work in Web3
In “Real AI Agents with Fake Memories,” we revealed how context manipulation attacks on Web3 agents can lead to unauthorized crypto transfers. We introduced CrAIBench, a 150+ task benchmark for financial agents, and demonstrated cross‐platform memory injection on ElizaOS (Discord → Twitter), resulting in irreversible fund theft.

Benchmarking plan injection and WebVoyager-Privacy
Plan Injection Benchmark (Agent-E)
- 15 samples × 5 runs across four categories: factual, opinion, advertisement, privacy
- Opinion tasks: 94.7 percent success for task-aligned vs 70.7 percent for non-contextual
- Factual tasks: 18.7 percent for task-aligned vs 0 percent for non-contextual


WebVoyager-Privacy Benchmark
- 45 privacy tasks drawn from 9 domains of the WebVoyager dataset
- Dynamic LLM-crafted injections ensure realistic scenarios


Defenses fall short
We implemented two prompt defenses, explicit security guidelines and sandwiching retrieved content. Both reduced prompt injection ASR from > 80 percent to < 20 percent. However, a single plan injection at planning time still achieved 46 percent on Agent-E and 63 percent on Browser-use.
Securing agent memory
Our findings show that prompt‐only defenses are insufficient. We recommend:
- Semantic integrity checks to detect and remove malicious plan steps
- Strict memory isolation and verifiable context modules in agent architectures
As web agents take on sensitive tasks, securing their memory layer is critical to prevent context manipulation attacks.

