Giant Network Partners with Universities to Unveil Three Audio-Visual Multimodal Generation Technologies - AI

7 x 24 Track global technological trends

Hot Topic

Day

News Topic

Giant Network Partners with Universities to Unveil Three Audio-Visual Multimodal Generation Technologies

2025-11-27 / Read about 0 minute

Author：小编

According to AI Base, Giant Network's AI Lab has recently teamed up with the SATLab at Tsinghua University and Northwestern Polytechnical University to jointly introduce three significant achievements in audio-visual multimodal generation technology. These include the music-driven video generation model YingVideo-MV, the zero-shot singing voice conversion model YingMusic-SVC, and the singing synthesis model YingMusic-Singer. The corresponding code for these models will be progressively made available as open-source on GitHub and HuggingFace platforms.

Among these innovations, YingVideo-MV stands out by its ability to generate a music video with a synchronized rhythm and a rich array of camera techniques, using only a piece of music and a single character image. This effectively addresses issues of character distortion. YingMusic-SVC, on the other hand, optimizes the process of singing voice conversion in real-world music scenarios, substantially minimizing distortion. Lastly, YingMusic-Singer offers the flexibility of arbitrary lyric input and zero-shot timbre cloning, thereby enhancing the practicality and creative versatility of AI-generated singing.

Previous page：Wu Jia of Alibaba Shares Insights on QianWen for t...

Next page：HSBC: OpenAI May Find It Tough to Turn a Profit by...

Return to List

Hot Reading

2 day ago

Tesla is un-canceling its plan to build a smaller, cheaper EV: report

2 day ago

First, Tesla canceled the Model 2—now it's working on a new small EV

2 day ago

Volkswagen ends ID.4 production in Tennessee to build Atlas SUV

2 day ago

How AI Is Quietly Transforming Your Daily Life Without You Noticing Its Impact Today