StepFun Unveils StepAudio 2.5 TTS: Pushing the Frontiers of Speech Generation Expression
3 hour ago / Read about 0 minute
Author:小编   

On April 16, 2026, StepFun officially rolled out its cutting-edge speech generation model, StepAudio 2.5 TTS. This model boasts three key functionalities: global context manipulation, in-text context management, and zero-shot voice replication with comprehensive timbre modulation. It facilitates the customization of emotional undertones, character personas, and ambient settings for complete speech segments. Additionally, it allows for meticulous calibration of expressive nuances, including intonation, pacing, and pauses. This facilitates zero-shot voice replication and adaptable emotional style modulation. Through natural language, users can exert precise control over speech generation, indulging in multi-faceted, highly emotive speech synthesis without the necessity for intricate training processes. The model is now fully accessible on the StepFun open platform and Step Plan, making it ideal for applications such as character voiceovers, audio content production, and intelligent voice interaction.