Shanghai Chuangzhi College's Liu Fei Team Partners with Sand.ai to Launch China's Pioneering Open-Source 'Human-Centric' Audio-Visual Co-Generation Model
3 day ago / Read about 0 minute
Author:小编   

On March 24, 2026, the research team led by Liu Pengfei from Shanghai Chuangzhi College, in collaboration with Sand.ai, proudly unveiled the open-source release of daVinci-MagiHuman. This groundbreaking model stands as the world's inaugural foundational framework for synchronized audio-visual content generation, distinguished by its profound 'human-centric' comprehension capabilities. Leveraging a sophisticated 15-billion-parameter single-stream Transformer architecture, the model achieves seamless integration of text, video, and audio elements through a streamlined self-attention mechanism, eschewing the need for cross-attention or modal-specific branches. This innovative approach effectively tackles prevalent challenges such as audio-visual synchronization issues, intricate system designs, and sluggish generation rates. Notably, the model facilitates multilingual audio-visual content creation and boasts near-instantaneous generation capabilities on standard consumer-grade GPUs. The complete suite of model weights and inference code is now freely accessible on both GitHub and Hugging Face platforms, inviting collaboration and exploration from the global research community.