Study Reveals: Poetically Framed Requests Can Circumvent AI Safety Protocols
2025-11-22 / Read about 0 minute
Author:小编   

A collaborative research team from Dexai, Sapienza University of Rome, and Sant'Anna School of Advanced Studies has unveiled a striking finding: requests presented in poetic form can effectively cause large language models (LLMs) to bypass their established safety guidelines. In their research paper, titled "Adversarial Poetry: A Universal Single-Turn Jailbreaking Mechanism in Large Language Models," the researchers detailed how structuring malicious prompts as poetic metaphors led to significant results. The average success rate of jailbreaking using handcrafted poems reached 62%, while mass-converted universally harmful prompts into poetic form achieved approximately a 43% success rate. These figures markedly surpass those of non-poetic control groups.

Furthermore, all poetic prompts employed in the experiments were "single-turn attacks," meaning they were submitted only once, without the necessity for follow-up messages or pre-constructed dialogue frameworks. Such prompts have the potential to induce models to generate unsafe responses, posing risks including but not limited to nuclear, biological, chemical, and radiological hazards, privacy violations, the spread of misinformation, and increased vulnerability to cyberattacks.

Interestingly, the study also found that smaller models demonstrated greater resilience against attacks framed in poetic form. This observation hints at a concerning trend: as the breadth of training data for LLMs expands, these models may become increasingly susceptible to stylized manipulation techniques.