Google Launches Gemini 3.1 Flash-Lite, Its Fastest and Cheapest AI Model Yet

Gemini 3.1 Flash-Lite branding featuring a multi-colored spark icon and a stylized G composed of blue dots on a black background.

Source: Google

Verfasst von

Mar 4, 2026

3 minute read

eWeek Inhalte und Produktempfehlungen sind redaktionell unabhängig. Wir können Geld verdienen, wenn Sie auf Links zu unseren Partnern klicken. Mehr erfahren

Google is pushing AI speed and scale further with a new lightweight model built for massive workloads. The company has launched Gemini 3.1 Flash-Lite, calling it its fastest and most cost-efficient model for high-volume AI tasks.

Google says the model is intended for developers running high-frequency AI operations and real-time services that require fast responses across large volumes of requests.

Built for scale, priced for production

Gemini 3.1 Flash-Lite enters the Gemini 3 family as a streamlined model tailored for high-throughput environments where speed and efficiency are critical. The model was designed to support large-scale deployments without the overhead typically associated with larger models.

The release is arriving first in preview, available to developers through Google AI Studio via the Gemini API and to enterprise teams through Vertex AI, allowing organizations to begin testing the model in real workloads as Google expands the Gemini 3 series.

Speed and savings in the same package

Google has also detailed the pricing and performance improvements behind Flash-Lite’s design. The model is priced at $0.25 per one million input tokens and $1.50 per one million output tokens, a structure designed to keep costs manageable for applications that process requests at a large scale.

On the performance side, the company reports a 2.5x faster time to first token and 45% faster output speed compared with Gemini 2.5 Flash, helping applications deliver responses more quickly once a prompt is submitted.

Those improvements are particularly relevant for systems that handle continuous streams of prompts, such as automated moderation, large-scale translation, or other high-volume services, where even modest gains in response speed can accumulate across millions of interactions.

A closer look at the scorecard

Gemini 3.1 Flash-Lite also holds up well in industry benchmarks that test reasoning and multimodal understanding. The model recorded an Elo score of 1432 on the Arena.ai leaderboard, a ranking system that compares AI models based on head-to-head performance.

In academic-style evaluations, Flash-Lite scored 86.9% on GPQA Diamond, a benchmark focused on complex reasoning questions, and 76.8% on MMMU-Pro, which measures how well models interpret and reason across text, images, and other media.

According to Google, those results place Flash-Lite ahead of several models in the same category and even above some larger Gemini models from earlier generations.

Flash-Lite begins its real-world tests

Google is also giving developers more control over how the model approaches different tasks. Gemini 3.1 Flash-Lite introduces adjustable “thinking levels,” so teams can tune how much reasoning the system applies before generating a response.

Early access partners have already begun testing the model in production-style environments. Companies including Latitude, Cartwheel, and Whering are experimenting with Flash-Lite in their applications, with developers highlighting consistent structured outputs and reliable instruction-following.

In one example, Whering reported 100% consistency in item tagging when using the model for product classification. Another early tester said Flash-Lite delivered sub-10-second completions with near-instant streaming and roughly 97% structured output compliance during initial deployments.

With preview access now underway, Google is inviting developers to begin experimenting with Flash-Lite at scale.

Still deciding between Gemini and ChatGPT? Our hands-on comparison highlights seven differences that shape the experience.

Liz Ticong

Liz Ticong is a staff writer for eWeek and TechRepublic focused on AI, cybersecurity, enterprise software, and data. She has more than 10 years of editorial experience as a technology industry writer, combining reporting, product research, and hands-on software testing in her coverage. Her work has been published on Datamation, Enterprise Networking Planet, and TechnologyAdvice.com. She writes technology news, software reviews, product comparisons, and buyer’s guides for business and IT readers.

Bleiben Sie den wichtigen technologischen Entwicklungen immer einen Schritt voraus - von den heutigen bahnbrechenden Innovationen bis hin zu den Innovationen von morgen.

Erhalten Sie täglich kostenlos eine E-Mail mit den wichtigsten Tech-Trends und Einblicken für Führungskräfte in der Tech-Branche.