Meituan Unveils Native Multimodal Model LongCat-Next, Integrating Information Representation in the Real World
1 day ago / Read about 0 minute
Author:小编   

On March 27, 2026, Meituan introduced and fully open-sourced its innovative native multimodal large model, LongCat-Next, along with its pivotal component—the Discrete Native Resolution Visual Tokenizer (dNaViT). This model diverges from the conventional language-dominated framework by uniformly converting images, speech, and text into homologous discrete tokens. Utilizing the 'next token prediction' approach, it empowers vision and speech to function as the 'native languages' of AI. LongCat-Next attains three significant technological advancements: the Discrete Native Autoregressive Architecture (DiNA), the Discrete Native Resolution Visual Tokenizer (dNaViT), and a semantically aligned comprehensive encoder. Official assessments indicate that the model achieves superior performance across various dimensions, encompassing visual comprehension, image generation, audio processing, and pure text tasks.