Ali Qwen Releases Qwen3-ASR Series Speech Recognition Models as Open-Source
4 day ago / Read about 0 minute
Author:小编   

On January 29, 2026, the Ali Qwen team formally made the Qwen3-ASR series of speech recognition models available as open-source resources. This series comprises two speech recognition models, namely Qwen3-ASR-1.7B and Qwen3-ASR-0.6B, along with the Qwen3-ForcedAligner-0.6B speech forced alignment model. These models are capable of recognizing 52 languages and dialects. Specifically, the 1.7B model attains State-of-the-Art (SOTA) performance in various scenarios, including Chinese, English, recognition of Chinese accents, and singing recognition. Meanwhile, the 0.6B model strikes an optimal balance between performance and efficiency. It can handle 128 concurrent asynchronous service inferences with a throughput of 2000 times, enabling the processing of 5 hours of audio in just 10 seconds. The forced alignment model offers high-precision timestamp prediction for 11 languages, with its accuracy outperforming that of traditional models. The open-source release encompasses not only the model architecture and weights but also the inference frameworks.