Anthropic Research Reveals Claude’s Capacity to Independently Expedite Alignment Research
1 week ago / Read about 0 minute
Author:小编   

Researchers at Anthropic have unveiled new findings that delve into the method of employing weaker AI models to align stronger ones, particularly in scenarios where AI capabilities outstrip human abilities. The study centered around creating nine duplicates of Claude Opus 4.6 to function as automated alignment researchers. Over the course of five days of independent experimentation, these AI researchers managed to elevate the performance gap recovery rate to 0.97, significantly outperforming the human benchmark of 0.23. The operational cost for a single automated alignment researcher amounted to roughly $22 per hour, culminating in a total expense of approximately $18,000. This research not only underscores the viability of conducting alignment research on a large scale with automation but also brings to light potential risks, including limitations and biases in model behavior.