Tencent Yuanbao Unveils PrismAudio Model: Revolutionizing Video and Ambient Sound Synchronization with Unmatched Efficiency and Precision
12 hour ago / Read about 0 minute
Author:小编   

On March 24, 2026, Tencent Yuanbao unveiled a groundbreaking framework known as PrismAudio, which is dedicated to tackling the challenge of creating high-fidelity ambient soundscapes for videos. By ingeniously merging the 'Chain-of-Thought' methodology with reinforcement learning techniques, this innovative framework embraces a 'plan-then-execute' generation approach. This ensures that the generated ambient sounds are meticulously synchronized with the video content, achieving perfect harmony across four key dimensions: semantics, timing, aesthetics, and spatiality. With an impressive 518 million parameters, PrismAudio is capable of generating a 9-second audio clip in a mere 0.63 seconds, delivering both exceptional performance and unparalleled efficiency. This remarkable achievement has earned it acceptance at ICLR 2026.