Anthropic has introduced an open-source tool named Circuit Tracer, which visually exposes the intricate internal thought processes of large AI language models. By constructing attribution graphs, researchers can gain a comprehensive visual understanding of how these models function and interactively delve into their operations, thereby enhancing AI safety. Available on GitHub, this tool empowers users to create customized attribution graphs, annotate and share them, and observe alterations in model outputs to verify hypotheses. Anthropic's objective in open-sourcing Circuit Tracer is to foster a deeper community understanding of the internal mechanisms of language models.
