Giant Network Partners with Universities to Unveil Three Audio-Visual Multimodal Generation Technologies
2025-11-27 / Read about 0 minute
Author:小编   

According to AI Base, Giant Network's AI Lab has recently teamed up with the SATLab at Tsinghua University and Northwestern Polytechnical University to jointly introduce three significant achievements in audio-visual multimodal generation technology. These include the music-driven video generation model YingVideo-MV, the zero-shot singing voice conversion model YingMusic-SVC, and the singing synthesis model YingMusic-Singer. The corresponding code for these models will be progressively made available as open-source on GitHub and HuggingFace platforms.

Among these innovations, YingVideo-MV stands out by its ability to generate a music video with a synchronized rhythm and a rich array of camera techniques, using only a piece of music and a single character image. This effectively addresses issues of character distortion. YingMusic-SVC, on the other hand, optimizes the process of singing voice conversion in real-world music scenarios, substantially minimizing distortion. Lastly, YingMusic-Singer offers the flexibility of arbitrary lyric input and zero-shot timbre cloning, thereby enhancing the practicality and creative versatility of AI-generated singing.