Raising the Bar for Legal Reasoning

Published on

June 20, 2025

Read time:

6 mins

Tl;dr:

How Sentient’s CHANCERY Benchmark and Agentic Frameworks Push the Frontiers of AI

Why Corporate Governance Is a Perfect Stress-Test for AI

Legal reasoning isn’t just about spotting keywords in court opinions; it’s about untangling dense statutes, cross-referencing precedent, and mapping every “what-if” back to black-letter law. That mix of multi-step logic and hard-coded domain knowledge is exactly where even today’s largest models stumble. If we want an open AGI that is truly loyal to humanity, it must master these high-stakes reasoning tasks—safely, transparently, and in the open.

Introducing CHANCERY: Sentient’s New Benchmark

To expose current limits (and chart a path forward), our research team built CHANCERY, the first evaluation that zeroes in on corporate-governance reasoning:

24 canonical governance principles (from Poison Pills to Secret Ballots)
79 real-world charters spanning tech, finance, energy, retail, and more
502 handcrafted scenarios where a board, executive, or shareholder proposes an action and the model must answer a single binary question: Is it legal under the charter?

Why corporate charters? They force models to juggle charter clauses and statutory law, often across several hops of deduction—perfect for stress-testing advanced reasoning.

Sentient’s Agentic Breakthroughs: ReAct & CodeAct

Both frameworks are fully open-source under permissive licenses, so anyone can audit, fork, or extend them.

The Numbers: Outperforming GPT-4o

Key takeaway: Size alone isn’t enough. Agentic reasoning—especially when open and auditable—beats raw parameter counts.

What We Learned About Today’s Models

Principle-sensitive blind spots. Poison Pills, Secret Ballots, and Anti-Greenmail clauses tanked most models—revealing gaps in how they generalize beyond headline corporate actions.
Multi-hop logic hurts. Accuracy fell ~15 pts whenever a scenario required weaving together several charter sections plus Delaware statute.
External retrieval remains brittle. Without tight prompting, even GPT-class models “forget” to look up missing definitions and guess instead.

These failure modes give us a concrete roadmap for improving both model pre-training and agent design.

Why This Matters for Open AGI

Safety through transparency. By open-sourcing the benchmark and our top-scoring agents, we invite the entire community to verify, critique, and extend our work.
Real-world alignment. Corporate-governance mistakes cost billions. Proving AI can handle these edge-cases brings us closer to systems society can trust.
Democratizing capability. Closed-source models shouldn’t be the only ones with advanced legal reasoning. CHANCERY levels the playing field for every researcher, startup, and public-interest group.

Sentient is committed to building open AGI that is transparent, verifiable, and relentlessly aligned with human interests. Projects like CHANCERY are how we get there—one hard benchmark, one open-source agent, and one community contribution at a time.

Check out EigenCloud's blog (Previously EigenLayer) on verifiability and the applications of CHANCERY.