On a Sunday, Meta unexpectedly unveiled the Llama 4 series of models, marking its inaugural foray into the MoE (Mixture of Experts) architecture. This series comprises the Llama 4 Scout, Llama 4 Maverick, and the eagerly anticipated Llama 4 Behemoth. Both the Llama 4 Scout and Llama 4 Maverick are now available, each equipped with an impressive 17 billion active parameters. The Llama 4 Behemoth, serving as the series' "teacher model," boasts an astonishing total of nearly 2 trillion parameters but is still awaiting its official release. Leveraging the MoE architecture, these models enhance computational and inferential efficiency, support multimodal capabilities, and achieve groundbreaking advancements in long text processing.
