Google has recently unveiled its groundbreaking first hybrid inference model, Gemini 2.5 Flash, which boasts an innovative adjustable 'thinking budget' feature. This feature empowers developers to activate or deactivate the deep inference mode, effectively reducing usage costs. The preview edition of Gemini 2.5 Flash has been seamlessly integrated into the Gemini product line and is now accessible to developers via API. While maintaining swift response times, this version significantly amplifies inference capabilities, offering developers a versatile tool to achieve an optimal balance between quality, cost, and latency.
