Xiaomi Releases Source Code for Its First In-House End-to-End Speech Large Model

3 day ago / Read about 0 minute

Author：小编

Today, Xiaomi has formally made the source code of its first in-house end-to-end speech model, Xiaomi-MiMo-Audio, publicly available. Leveraging an innovative pre-training framework and honed through training on data spanning hundreds of millions of hours, this model stands out as the first to accomplish ICL-based few-shot generalization within the speech realm. Moreover, during pre-training, it exhibited remarkable 'emergent' behavior. In a series of standard assessments, MiMo-Audio not only substantially outshone open-source models boasting comparable parameter quantities but also outperformed Google's proprietary model, Gemini-2.5-Flash, on the MMAU test set—a benchmark for audio understanding. Additionally, it surpassed OpenAI's proprietary model, GPT-4o-Audio-Preview, on the Big Bench Audio S2T task, a benchmark for audio complex reasoning.

Previous page：Google Chrome Browser Deeply Integrates Gemini to ...

Next page：Memect AI and Tsinghua University Unveil VoxCPM Vo...

Return to List

Hot Reading

2 day ago

Nvidia CEO Jensen Huang says that DGX Spark is powered by N1, confirming N1 SoC and GB10 Superchip as the same

2 day ago

Apple just gave me a huge reason to upgrade my go-to travel headphones

2 day ago

Science journalists find ChatGPT is bad at summarizing scientific papers

1 day ago

AMD relaunches $40 Athlon 3000G CPU with new packaging and cooler