Gemini 2.5 Flash Debuts with a Bang: Introducing Hybrid Inference Model featuring Toggleable Deep Thinking Mode and a 600% Cost Reduction

2025-04-18 / Read about 0 minute

Author：小编

Google has recently unveiled its groundbreaking first hybrid inference model, Gemini 2.5 Flash, which boasts an innovative adjustable 'thinking budget' feature. This feature empowers developers to activate or deactivate the deep inference mode, effectively reducing usage costs. The preview edition of Gemini 2.5 Flash has been seamlessly integrated into the Gemini product line and is now accessible to developers via API. While maintaining swift response times, this version significantly amplifies inference capabilities, offering developers a versatile tool to achieve an optimal balance between quality, cost, and latency.

Previous page：BMW Partners with ByteDance: Integrating AI into D...

Next page：Zhou Hongyi Advocates for Failed Entrepreneurs, Cr...

Return to List

Hot Reading

2 day ago

AI startups are eating the venture industry and the returns, so far, are good

2 day ago

Major SteamOS update adds support for Steam Machine, even more third-party hardware

2 day ago

Jeff Bezos’ Blue Origin enters the space data center game

2 day ago

Microsoft keeps insisting that it's deeply committed to the quality of Windows 11