Ant Group Makes Full-Modal Large Model Ming-Flash-Omni 2.0 Open-Source - AI

7 x 24 Track global technological trends

Hot Topic

Day

News Topic

Ant Group Makes Full-Modal Large Model Ming-Flash-Omni 2.0 Open-Source

4 day ago / Read about 0 minute

Author：小编

On February 11, 2026, Ant Group took a significant stride by officially open-sourcing and launching the full-modal large model Ming-Flash-Omni 2.0. This model stands as the first-ever unified audio generation model in the industry that caters to all scenarios. It boasts the remarkable ability to generate speech, ambient sound effects, and music concurrently on a single audio track. Users are empowered to precisely control parameters like voice timbre, speech rate, intonation, volume, emotion, and dialect via natural language instructions.

During the inference stage, the model attains an incredibly low inference frame rate of 3.1Hz, which enables real-time, high-fidelity generation of audio that lasts for minutes. In numerous public benchmark tests, Ming-Flash-Omni 2.0 showcased exceptional performance in crucial capabilities such as visual language understanding, controllable speech generation, image generation, and editing. In fact, some of its metrics even outperformed those of Gemini 2.5 Pro.

At present, the model weights and inference code have been made available on open-source communities like Hugging Face. Moreover, developers have the option to experience and leverage the model online through Ant Group's Ling Studio platform.

Previous page：Alibaba Unveils RynnBrain: Empowering Robots with ...

Next page：NetEase Youdao Rolls Out the 'LobsterAI'—An All-Pu...

Return to List

Hot Reading

2 day ago

Spotify says its best developers haven’t written a line of code since December, thanks to AI

1 day ago

Top 15 Must-See Tech Innovations from CES 2026

2 day ago

Didero lands $30M to put manufacturing procurement on ‘agentic’ autopilot

2 day ago

Attackers prompted Gemini over 100,000 times while trying to clone it, Google says