ByteDance Unveils Vidi2 Multimodal Large Model, Revolutionizing Video Editing
2025-12-01 / Read about 0 minute
Author:小编   

ByteDance has recently rolled out Vidi2, a cutting - edge multimodal large language model boasting 12 billion parameters. This model is specifically tailored for video understanding and generation, with the remarkable ability to handle videos that span several hours.

In terms of functionality, Vidi2 can automatically arrange the narrative logic of videos in a coherent manner. It's capable of creating short videos or movie clips effortlessly. One of its standout features is precise spatiotemporal localization. This means it can directly output timestamps and bounding boxes for specific objects or individuals within the video, providing a high level of accuracy and detail.

At present, some of Vidi2's capabilities have been seamlessly integrated into TikTok products. For instance, users can now enjoy the benefits of Smart Split intelligent editing and AI Outline script generation, which are powered by the advanced technology of Vidi2.