Xiaomi Releases Source Code for Its First In-House End-to-End Speech Large Model
3 day ago / Read about 0 minute
Author:小编   

Today, Xiaomi has formally made the source code of its first in-house end-to-end speech model, Xiaomi-MiMo-Audio, publicly available. Leveraging an innovative pre-training framework and honed through training on data spanning hundreds of millions of hours, this model stands out as the first to accomplish ICL-based few-shot generalization within the speech realm. Moreover, during pre-training, it exhibited remarkable 'emergent' behavior. In a series of standard assessments, MiMo-Audio not only substantially outshone open-source models boasting comparable parameter quantities but also outperformed Google's proprietary model, Gemini-2.5-Flash, on the MMAU test set—a benchmark for audio understanding. Additionally, it surpassed OpenAI's proprietary model, GPT-4o-Audio-Preview, on the Big Bench Audio S2T task, a benchmark for audio complex reasoning.