Volcano Engine Unveils First All-Modal Comprehension Model in Doubao Large Model Series, Enabling Unified Interpretation of Video, Image, Audio, and More
7 hour ago / Read about 0 minute
Author:小编   

On May 6, 2026, Volcano Engine, a division of ByteDance, revealed that the Doubao large model family has rolled out its inaugural all-modal comprehension model—an enhanced iteration of Doubao-Seed-2.0-lite. This model is engineered to natively and cohesively interpret video, image, audio, and text, accompanied by concurrent enhancements to its Agent, Coding, and GUI functionalities. The model is currently accessible via the Volcano Ark platform. Concurrently, a brand-new variant of Doubao-Seed-2.0-mini has also been introduced, featuring all-modal comprehension capabilities, a 40% reduction in processing time, and a 35% boost in Token utilization efficiency.