Volcano Engine Unveils First All-Modal Comprehension Model in Doubao Large Model Series, Enabling Unified Interpretation of Video, Image, Audio, and More

7 hour ago / Read about 0 minute

Author：小编

On May 6, 2026, Volcano Engine, a division of ByteDance, revealed that the Doubao large model family has rolled out its inaugural all-modal comprehension model—an enhanced iteration of Doubao-Seed-2.0-lite. This model is engineered to natively and cohesively interpret video, image, audio, and text, accompanied by concurrent enhancements to its Agent, Coding, and GUI functionalities. The model is currently accessible via the Volcano Ark platform. Concurrently, a brand-new variant of Doubao-Seed-2.0-mini has also been introduced, featuring all-modal comprehension capabilities, a 40% reduction in processing time, and a 35% boost in Token utilization efficiency.

Previous page：Musk Declares: xAI to Be Rebranded as SpaceXAI, Ce...

Next page：Revenue Rockets 80 Times, Anthropic Frantically De...

Return to List

Hot Reading

1 day ago

Silicon Valley bets $200M on AI data centers floating in the ocean

1 day ago

AMD posts record first-quarter results, driven by skyrocketing data center CPU demand

2 day ago

Tesla hits Musk’s threshold for ‘safe unsupervised’ driving

22 hour ago

Google's Gemma 4 AI models get 3x speed boost by predicting future tokens