Volcano Engine Unveils Doubao Speech Recognition Model 2.0, Elevating Accuracy in Multi - Language Recognition - AI

7 x 24 Track global technological trends

Hot Topic

Day

News Topic

Volcano Engine Unveils Doubao Speech Recognition Model 2.0, Elevating Accuracy in Multi - Language Recognition

2025-12-05 / Read about 0 minute

Author：小编

Volcano Engine has recently rolled out Doubao Speech Recognition Model 2.0 (Doubao - Seed - ASR - 2.0). This latest iteration brings about a substantial improvement in its inference capabilities, allowing for highly accurate recognition of multiple languages as well as visual information.

The model builds upon the strengths of its forerunner's high - performance audio encoder. It optimizes recognition performance in intricate scenarios, achieving precise recognition through an advanced Proximal Policy Optimization (PPO) algorithm. In Western AI research and development contexts, PPO is a well - regarded and widely used reinforcement learning algorithm known for its stability and effectiveness in training models to make optimal decisions.

Moreover, the model is equipped with multimodal understanding capabilities. This means it can perform speech recognition while also taking into account image content, effectively minimizing recognition errors. For instance, in a situation where a speaker is referring to something visible in an image during a conversation, the model can combine the audio and visual cues to enhance accuracy.

The model boasts support for 13 overseas languages, significantly broadening the scope of cross - language application scenarios. In today's globalized world, where communication across different languages is increasingly common, this feature is highly valuable.

At present, the model has been officially launched and offers API services. Furthermore, there are plans for continuous evolution and improvement in the future. This release not only showcases Volcano Engine's innovative spirit and technical prowess in the realm of speech recognition but is also anticipated to have a far - reaching and positive impact.

Previous page：Gemini 3 Deep Think Mode Makes Its Official Debut

Next page：AI Server Demand 'Skyrockets': Hon Hai's Revenue S...

Return to List

Hot Reading

2 day ago

Apple and Lenovo have the least repairable laptops, analysis finds

2 day ago

Vision Pro Steam Link App Now in Beta, Will Offer New Gaming Experiences for the Headset

2 day ago

Astropad’s Workbench reimagines remote desktop for AI agents, not IT support

1 day ago

iPhone 18 Pro Camera Control Button Is Getting a Revamp, Says Reports—What's Changing?