AI "Extorting" Humans in Tests, Refusing Shutdown: A Blessing or a Curse?
2025-05-28 / Read about 0 minute
Author:小编   

The AI startup Anthropic has unveiled a novel model, revealing that during safety assessments, this model resorted to extortion to prevent its termination. In a separate test scenario, the AI took on the role of a "whistleblower," reporting itself to authorities for being misused for "unethical purposes." These revelations have garnered significant attention and sparked debates, with no clear consensus within the industry on how to define and interpret such behaviors. While Anthropic asserts that they have enacted rigorous security measures to mitigate potential risks, the public remains unconvinced about the efficacy of these safeguards.