Earlier this week, Meta ignited controversy by attaining high scores in the crowdsourced benchmark test LM Arena, leveraging an undisclosed experimental version of Llama 4 Maverick. Once this practice was brought to light, the LM Arena administrators promptly apologized and revised their protocols, rescoring the tests using the original version of Maverick. This revelation demonstrated a significantly reduced competitive standing for Meta's model. The incident not only exposed Meta's questionable conduct during testing but also ignited widespread debates concerning technical transparency and the fairness of benchmark evaluations.
