Meituan Unveils Its Native Multimodal Marvel: LongCat-Next - AI

7 x 24 Track global technological trends

Hot Topic

Day

News Topic

Meituan Unveils Its Native Multimodal Marvel: LongCat-Next

2 day ago / Read about 0 minute

Author：小编

On March 27, Meituan made a significant leap in the AI landscape by releasing and fully open-sourcing its innovative native multimodal large model, LongCat-Next, along with its pivotal component, the Discrete Native Resolution Visual Tokenizer (dNaViT).
This groundbreaking model diverges from the conventional language-dominated architecture of large models. It achieves this by uniformly transforming images, speech, and text into homologous discrete tokens, thus creating a unified representation across modalities.
Leveraging the "Next Token Prediction" (NTP) paradigm, LongCat-Next empowers vision and speech to seamlessly integrate as native input modalities for AI systems. This integration opens up new avenues for AI applications, enabling them to process and understand multimodal data more effectively.
LongCat-Next boasts three pivotal technological advancements. Firstly, the Discrete Native Autoregressive Architecture (DiNA) dismantles the barriers between different modalities, allowing for a more fluid and integrated processing of multimodal information. Secondly, the Discrete Native Resolution Visual Tokenizer (dNaViT) constructs a visual "dictionary" that enhances the model's ability to interpret and generate visual content. Lastly, the Semantically Aligned Complete Encoder tackles the challenge of information loss during the discretization process, ensuring that the model retains the nuances and subtleties of the original data.
Across various domains, including visual understanding, image generation, and audio processing, LongCat-Next showcases performance that is on par with, or even exceeds, that of specialized models. This remarkable achievement underscores Meituan's commitment to pushing the boundaries of AI technology and its potential to revolutionize the way we interact with and understand multimodal data.

Previous page：Musk: Major Release of Grok Imagine Next Week

Next page：Zhiyun Sets Up a New Tech Firm in Shenzhen, Ventur...

Return to List

Hot Reading

1 day ago

With new plugins feature, OpenAI officially takes Codex beyond coding

2 day ago

OpenAI Reportedly Delays ChatGPT 'Adult Mode' Indefinitely—But Why?

2 day ago

Teachers Are Learning AI at the Same Time as Students: Coco Coders Explains the Challenge

2 day ago

Why SoftBank’s new $40B loan points to a 2026 OpenAI IPO