On March 19, 2026, Xiaomi made a significant announcement regarding the introduction of three cutting-edge, self-developed large-scale models: Xiaomi MiMo-V2-Pro, Xiaomi MiMo-V2-Omni, and Xiaomi MiMo-V2-TTS. Notably, the MiMo-V2-Pro stands out as a premier text-based foundational model, boasting an impressive total of over 1 trillion parameters. It is capable of handling ultra-long contexts, accommodating up to 1 million tokens, and its API service has already been made accessible. The MiMo-V2-Omni, on the other hand, is a multimodal perception model designed to facilitate cross-modal comprehension across images, videos, audio, and text. Meanwhile, the MiMo-V2-TTS is a sophisticated speech synthesis model that supports a wide range of styles, delivering highly expressive speech synthesis capabilities.
Xiaomi's founder, Lei Jun, revealed that the company's research and development, along with capital investment in the AI sector for the current year, will surpass 16 billion yuan. Presently, both the MiMo-V2-Pro and MiMo-V2-Omni models have opened their APIs to the public. The MiMo-V2-Pro is priced at $1 per million tokens for input and $3 for output, while the MiMo-V2-Omni is set at $0.4 for input and $2 for output. Furthermore, Xiaomi has entered into collaborations with five leading Agent framework teams to offer limited-time free interface support for a duration of one week.
