ByteDance Unveils Vidi2 Multimodal Large Model, Revolutionizing Video Editing

2025-12-01 / Read about 0 minute

Author：小编

ByteDance has recently rolled out Vidi2, a cutting - edge multimodal large language model boasting 12 billion parameters. This model is specifically tailored for video understanding and generation, with the remarkable ability to handle videos that span several hours.

In terms of functionality, Vidi2 can automatically arrange the narrative logic of videos in a coherent manner. It's capable of creating short videos or movie clips effortlessly. One of its standout features is precise spatiotemporal localization. This means it can directly output timestamps and bounding boxes for specific objects or individuals within the video, providing a high level of accuracy and detail.

At present, some of Vidi2's capabilities have been seamlessly integrated into TikTok products. For instance, users can now enjoy the benefits of Smart Split intelligent editing and AI Outline script generation, which are powered by the advanced technology of Vidi2.

Previous page：SoftBank Group CEO: There Are No Obstacles to Achi...

Next page：Meta AI Unveils Decentralized Synthetic Data Frame...

Return to List

Hot Reading

2 day ago

Tech teardown specialist delids a Xeon with a blowtorch and hunting knife

2 day ago

Your iPhone's camera could get a Siri-powered update – and you won't even need to upgrade to the 18 Pro

2 day ago

Turns Out the Wrist Might Not Be the Best Place to Track Your Health

2 day ago

TechCrunch Mobility: How do you issue a ticket to a robotaxi?