From the Pinnacle to a Plunge: Meta Llama 4's Tumultuous 72 Hours
2025-04-09 / Read about 0 minute
Author:小编   

On April 8, Chatbot Arena, the preeminent ranking platform for large language models, issued a stern statement addressing community concerns regarding the placement of Meta's latest model, Llama 4. The authority announced its decision to publicly disclose the exhaustive data from over 2,000 live comparison tests, singling out Meta and mandating clearer labeling of Llama-4-Maverick-03-26-Experimental as a bespoke model. This move serves not only to dispel doubts but also as a cautionary tale for the broader large model industry. Chatbot Arena employs a rigorous live blind testing protocol to evaluate models, with its rankings profoundly influencing the reputation and adoption of these models among media outlets and developer communities. Following its introduction, Llama 4 swiftly ascended to the second spot on the rankings but subsequently faced scrutiny for utilizing undisclosed test sets during training and underperforming in certain benchmark tests. The Meta team clarified that Llama 4 was not trained on these test sets but conceded issues with the model's performance. The tumultuous journey of Llama 4 underscores the complexities and challenges amid the intensifying competition within the open-source large model landscape, sparking widespread debate about the genuine capabilities of such models.