Gemini 2.5 Flash Debuts with a Bang: Introducing Hybrid Inference Model featuring Toggleable Deep Thinking Mode and a 600% Cost Reduction
2025-04-18 / Read about 0 minute
Author:小编   

Google has recently unveiled its groundbreaking first hybrid inference model, Gemini 2.5 Flash, which boasts an innovative adjustable 'thinking budget' feature. This feature empowers developers to activate or deactivate the deep inference mode, effectively reducing usage costs. The preview edition of Gemini 2.5 Flash has been seamlessly integrated into the Gemini product line and is now accessible to developers via API. While maintaining swift response times, this version significantly amplifies inference capabilities, offering developers a versatile tool to achieve an optimal balance between quality, cost, and latency.