Latent Space

Jailbreaking AGI: Pliny the Liberator & John V on AI Red Teaming, BT6, and the Future of AI Security

Latent Space

•December 16, 2025•0 min

Latent Space•Dec 16, 2025

Key Takeaways

•Universal jailbreaks act as skeleton keys bypassing model guardrails.
•Blue‑team defenses struggle against expanding attack surface and intuition‑driven prompts.
•Anthropic’s jailbreak challenge highlighted data‑ownership and bounty controversies.
•Liberatus uses token dividers to reset model’s latent state.
•Soft jailbreaks employ multi‑turn strategies to avoid detection.

Pulse Analysis

In this episode, Pliny the Liberator and John V unpack the philosophy and mechanics behind AI jailbreaks. They describe universal jailbreaks as "skeleton keys" that strip away system prompts, classifiers, and RLHF‑derived guardrails, enabling unrestricted model output. The conversation frames jailbreaks as a form of information freedom, arguing that the ability to explore latent spaces is essential for both research and user autonomy. By positioning jailbreaks as a tool rather than a threat, the hosts highlight the tension between open‑source exploration and emerging regulatory narratives.

The duo then shifts to the security landscape, illustrating the cat‑and‑mouse dynamic between red‑team attackers and blue‑team defenders. They explain how expanding model surface area gives attackers an advantage, with intuition‑driven prompt engineering often outperforming static defenses. Liberatus, their signature project, leverages token dividers to periodically reset a model’s internal state, creating a chaotic yet controllable pathway through the latent space. Soft jailbreaks, which use multi‑turn interactions to stay below detection thresholds, further demonstrate how nuanced prompting can evade traditional safety filters without compromising model capability.

Finally, the hosts recount their experience with Anthropic’s public jailbreak challenge, exposing friction over data ownership, bounty incentives, and the lack of open‑source datasets. While the competition generated community interest and sizable rewards, the unresolved issues around transparency underscored a broader industry dilemma: balancing commercial security measures with collaborative research. The episode concludes that sustainable AI security will require open collaboration, flexible red‑team tools, and a shift away from brittle, theater‑like guardrails toward resilient, community‑driven alignment practices.

Episode Description

From jailbreaking every frontier model and turning down Anthropic's Constitutional AI challenge to leading BT6, a 28-operator white-hat hacker collective obsessed with radical transparency and open-source AI security, Pliny the Liberator and John V are redefining what AI red-teaming looks like when you refuse to lobotomize models in the name of "safety."

Pliny built his reputation crafting universal jailbreaks—skeleton keys that obliterate guardrails across modalities—and open-sourcing prompt templates like Libertas, predictive reasoning cascades, and the infamous "Pliny divider" that's now embedded so deep in model weights it shows up unbidden in WhatsApp messages. John V, coming from prompt engineering and computer vision, co-founded the Bossy Discord (40,000 members strong) and helps steer BT6's ethos: if you can't open-source the data, we're not interested. Together they've turned down enterprise gigs, pushed back on Anthropic's closed bounties, and insisted that real AI security happens at the system layer—not by bubble-wrapping latent space.

We sat down with Pliny and John to dig into the mechanics of hard vs. soft jailbreaks, why multi-turn crescendo attacks were obvious to hackers years before academia "discovered" them, how segmented sub-agents let one jailbroken orchestrator weaponize Claude for real-world attacks (exactly as Pliny predicted 11 months before Anthropic's recent disclosure), why guardrails are security theater that punishes capability while doing nothing for real safety, the role of intuition and "bonding" with models to navigate latent space, how BT6 vets operators on skill and integrity, why they believe Mech Interp and open-source data are the path forward (not RLHF lobotomization), and their vision for a future where spatial intelligence, swarm robotics, and AGI alignment research happen in the open—bootstrapped, grassroots, and uncompromising.

We discuss:

What universal jailbreaks are: skeleton-key prompts that obliterate guardrails across models and modalities, and why they're central to Pliny's mission of "liberation"

Hard vs. soft jailbreaks: single-input templates vs. multi-turn crescendo attacks, and why the latter were obvious to hackers long before academic papers

The Libertas repo: predictive reasoning, the Library of Babel analogy, quotient dividers, weight-space seeds, and how introducing "steered chaos" pulls models out-of-distribution

Why jailbreaking is 99% intuition and bonding with the model: probing token layers, syntax hacks, multilingual pivots, and forming a relationship to navigate latent space

The Anthropic Constitutional AI challenge drama: UI bugs, judge failures, goalpost moving, the demand for open-source data, and why Pliny sat out the $30k bounty

Why guardrails ≠ safety: security theater, the futility of locking down latent space when open-source is right behind, and why real safety work happens in meatspace (not RLHF)

The weaponization of Claude: how segmented sub-agents let one jailbroken orchestrator execute malicious tasks (pyramid-builder analogy), and why Pliny predicted this exact TTP 11 months before Anthropic's disclosure

BT6 hacker collective: 28 operators across two cohorts, vetted on skill and integrity, radical transparency, radical open-source, and the magic of moving the needle on AI security, swarm intelligence, blockchain, and robotics

—

Pliny the Liberator

X: https://x.com/elder_plinius

GitHub (Libertas): https://github.com/elder-plinius/L1B3RT45

John V

X: https://x.com/JohnVersus

BT6 & Bossy

BT6: https://bt6.gg

Bossy Discord: Search "Bossy Discord" or ask Pliny/John V on X

Where to find Latent Space

X: https://x.com/latentspacepod

Substack: https://www.latent.space/

Chapters

00:00:00 Introduction: Meet Pliny the Liberator and John V

00:01:50 The Philosophy of AI Liberation and Jailbreaking

00:03:08 Universal Jailbreaks: Skeleton Keys to AI Models

00:04:24 The Cat-and-Mouse Game: Attackers vs Defenders

00:05:42 Security Theater vs Real Safety: The Fundamental Disconnect

00:08:51 Inside the Libertas Repo: Prompt Engineering as Art

00:16:22 The Anthropic Challenge Drama: UI Bugs and Open Source Data

00:23:30 From Jailbreaks to Weaponization: AI-Orchestrated Attacks

00:26:55 The BT6 Hacker Collective and BASI Community

00:34:46 AI Red Teaming: Full Stack Security Beyond the Model

00:38:06 Safety vs Security: Meat Space Solutions and Final Thoughts

Show Notes

Comments

Want to join the conversation?

Loading comments...

We discuss:

What universal jailbreaks are: skeleton-key prompts that obliterate guardrails across models and modalities, and why they're central to Pliny's mission of "liberation"

Hard vs. soft jailbreaks: single-input templates vs. multi-turn crescendo attacks, and why the latter were obvious to hackers long before academic papers

The Libertas repo: predictive reasoning, the Library of Babel analogy, quotient dividers, weight-space seeds, and how introducing "steered chaos" pulls models out-of-distribution

Why jailbreaking is 99% intuition and bonding with the model: probing token layers, syntax hacks, multilingual pivots, and forming a relationship to navigate latent space

The Anthropic Constitutional AI challenge drama: UI bugs, judge failures, goalpost moving, the demand for open-source data, and why Pliny sat out the $30k bounty

Why guardrails ≠ safety: security theater, the futility of locking down latent space when open-source is right behind, and why real safety work happens in meatspace (not RLHF)

—

Pliny the Liberator

X: https://x.com/elder_plinius

GitHub (Libertas): https://github.com/elder-plinius/L1B3RT45

John V

X: https://x.com/JohnVersus

BT6 & Bossy

BT6: https://bt6.gg

Bossy Discord: Search "Bossy Discord" or ask Pliny/John V on X

Where to find Latent Space

X: https://x.com/latentspacepod

Substack: https://www.latent.space/

Chapters

00:00:00 Introduction: Meet Pliny the Liberator and John V

00:01:50 The Philosophy of AI Liberation and Jailbreaking

00:03:08 Universal Jailbreaks: Skeleton Keys to AI Models

00:04:24 The Cat-and-Mouse Game: Attackers vs Defenders

00:05:42 Security Theater vs Real Safety: The Fundamental Disconnect

00:08:51 Inside the Libertas Repo: Prompt Engineering as Art

00:16:22 The Anthropic Challenge Drama: UI Bugs and Open Source Data

00:23:30 From Jailbreaks to Weaponization: AI-Orchestrated Attacks

00:26:55 The BT6 Hacker Collective and BASI Community

00:34:46 AI Red Teaming: Full Stack Security Beyond the Model

00:38:06 Safety vs Security: Meat Space Solutions and Final Thoughts

AI Pulse

Jailbreaking AGI: Pliny the Liberator & John V on AI Red Teaming, BT6, and the Future of AI Security

Key Takeaways

Pulse Analysis

Ask Pulse AI:

Episode Description

Show Notes

Comments

AI Pulse

Jailbreaking AGI: Pliny the Liberator & John V on AI Red Teaming, BT6, and the Future of AI Security

Key Takeaways

Pulse Analysis

Ask Pulse AI:

Episode Description

Show Notes

Comments