VoxCPM 1.5 Officially Opens Its Source Code, Boasting Comprehensive Enhancements in Speech Generation
2025-12-11 / Read about 0 minute
Author:小编   

On December 10, 2025, MinWall Intelligence made an exciting announcement regarding the official release and open-sourcing of the VoxCPM 1.5 version. This iteration marks a substantial leap forward, showcasing remarkable improvements across several key areas: audio quality, generation efficiency, and stability.

In terms of audio quality, the AudioVAE sampling rate has undergone a significant upgrade, jumping from 16kHz to a more refined 44.1kHz. This enhancement paves the way for high-fidelity audio cloning, ensuring that the reproduced sounds are incredibly lifelike and true to the original.

Regarding generation efficiency, there has been a remarkable doubling in performance. Now, it takes a mere 6.25 tokens to generate 1 second of audio, a substantial reduction compared to previous versions. This increased efficiency not only speeds up the generation process but also optimizes resource utilization.

To cater to users' diverse needs for customization, new LoRA and full fine-tuning scripts have been introduced. These additions empower users to delve deep into the model's settings, enabling them to tailor the speech generation to their specific requirements with unprecedented precision.

Moreover, the stability of long-text generation has been meticulously optimized. This refinement has led to a significant reduction in audio artifacts, ensuring that even when generating lengthy audio sequences, the output remains smooth and of high quality.

The model is now readily accessible on both GitHub and Hugging Face platforms, inviting developers and enthusiasts worldwide to explore, utilize, and contribute to its further development.