Xiaomi Launches Full-Modal Foundation Model: Xiaomi MiMo-V2-Omni, Empowering Multimodal Perception, Tool Invocation, and Beyond
2 day ago / Read about 0 minute
Author:小编   

On March 19, Xiaomi introduced its cutting-edge full-modal foundation model, the Xiaomi MiMo-V2-Omni, tailored specifically for the Agent era. Constructed from scratch, this model seamlessly merges perception with action, offering robust support for a variety of functions, including multimodal perception. Prior to its official release, an early test version of the model, known as 'Healer Alpha,' experienced a swift surge in usage following its listing on OpenRouter. It also secured the highest average score on the OpenClaw evaluation leaderboard, PinchBench. The MiMo-V2-Omni stands out for its exceptional capabilities in audio, image, and video comprehension, as well as its advanced agent functionalities, outperforming competitors in select domains. At present, the model is accessible via API services, and Xiaomi has forged partnerships with five leading Agent development framework teams, providing developers with complimentary interface support for a limited period.

  • C114 Communication Network
  • Communication Home