Inworld AI Launches Realtime TTS-2: Closed-Loop Voice Model Revolutionizes Emotional Interaction
1 day ago / Read about 0 minute
Author:小编   

According to MarkteChpost, Inworld AI has officially launched its voice model Realtime TTS-2, which adopts a closed-loop system architecture and revolutionizes the voice interaction experience. Realtime TTS-2 breaks through the limitations of traditional text-to-speech technology, enabling real-time processing of conversational audio and accurately perceiving users' tone, rhythm, and emotional states, thereby achieving more natural and human-like communication. The model features four core functions: it supports developers in precisely controlling voice expression through natural language prompts; it enables conversational context awareness based on a closed-loop architecture, automatically continuing emotions and tones; it provides cross-language support, allowing the same voice identity to seamlessly switch among over 100 languages; and it introduces an innovative 'Advanced Voice Design' function, generating reusable voices with just text descriptions, eliminating the need for audio samples. Technologically, Realtime TTS-2 integrates Realtime STT, a router, and the TTS layer through a single Websocket connection, ensuring responses within 200 milliseconds. The generated voice includes human-like features such as natural pauses and interjections, supports voice cloning, and is adaptable to multiple scenarios. This launch marks Inworld AI's shift from competing on sound quality to innovating at the behavioral level. Realtime TTS-2 ranks first in the Artificial Analysis Speech Arena, demonstrating its technological leadership and propelling AI interaction toward a more emotionally understanding 'human-like communication' era.