SkyReels-V4 from Kunlun Tech Ranks Second Among Global Video Generation Models
3 day ago / Read about 0 minute
Author:小编   

According to a report by Pandaily, Skywork AI, a subsidiary of Kunlun Tech, officially released its multimodal video foundation model, SkyReels-V4, on February 27, 2026. This model supports cinema-grade audio-video synchronous output with 1080p resolution, 32 frames per second, and a maximum duration of 15 seconds, making it the world's first video large model to simultaneously support multimodal input, joint audio-video generation, and unified creative tasks. In the latest rankings by Artificial Analysis, SkyReels-V4 ranks second globally among active text-to-video (including audio) models and fourth in the all-time historical rankings, outperforming mainstream models such as Veo 3.1, Sora 2, Vidu Q3, and Wan 2.6. It adopts a symmetric dual-stream MMDiT architecture, achieving deep audio-video coupling through bidirectional cross-attention, and introduces RoPE frequency domain scaling and a trainable video sparse attention mechanism (VSA), significantly reducing computational overhead. The training employs a multi-stage progressive paradigm, with final fine-tuning based on 5 million multimodal data samples. SkyReels-V4 serves as a core component of Kunlun Tech's AI ecosystem in the video sector and will support generation durations exceeding 60 seconds, real-time interactive editing, and open APIs in the future. It will collaborate with the Skywork, Mureka, and Matrix Game model families to build a comprehensive multimodal content production system.