On March 4, 2026, Google made an official announcement regarding the release of Gemini 3.1 Flash-Lite. The company touts it as the swiftest and most cost-effective model within the Gemini 3 lineup, tailored specifically for developers managing extensive, high-throughput workloads. Currently, developers can access a preview version of the model through the Gemini API on Google AI Studio, while enterprise users can utilize it via the Vertex AI platform. The pricing structure is set at $0.25 per million input tokens and $1.50 per million output tokens. Based on benchmark tests conducted by Artificial Analysis, the first-token response speed of this model is 2.5 times quicker than that of its forerunner, and its output speed has seen a 45% enhancement. It achieved a score of 1432 points on the Arena.ai leaderboard, securing 86.9% and 76.8% in the GPQA Diamond and MMMU Pro tests, respectively, thereby outperforming other models in its category. Moreover, Gemini 3.1 Flash-Lite boasts a 'thinking tier' function. This innovative feature empowers developers to adjust the model's reasoning depth in a flexible manner according to the complexity of the task at hand. Consequently, it is well-suited for a range of applications, from cost-effective tasks like bulk translation and content moderation to more intricate scenarios such as generating user interfaces and creating simulated environments.
