Google Launches Gemini 3.1 Flash-Lite, Its Fastest and Cheapest AI Model Yet

Google Launches Gemini 3.1 Flash-Lite, Its Fastest and Cheapest AI Model Yet

Gemini 3.1 Flash-Lite branding featuring a multi-colored spark icon and a stylized G composed of blue dots on a black background.

Source: Google

Verfasst von
Liz Ticong
Liz Ticong
Mar 4, 2026
3 minute read
eWeek Inhalte und Produktempfehlungen sind redaktionell unabhängig. Wir können Geld verdienen, wenn Sie auf Links zu unseren Partnern klicken. Mehr erfahren

Google is pushing AI speed and scale further with a new lightweight model built for massive workloads. The company has launched Gemini 3.1 Flash-Lite, calling it its fastest and most cost-efficient model for high-volume AI tasks.

Google says the model is intended for developers running high-frequency AI operations and real-time services that require fast responses across large volumes of requests.

Built for scale, priced for production

Gemini 3.1 Flash-Lite enters the Gemini 3 family as a streamlined model tailored for high-throughput environments where speed and efficiency are critical. The model was designed to support large-scale deployments without the overhead typically associated with larger models.

The release is arriving first in preview, available to developers through Google AI Studio via the Gemini API and to enterprise teams through Vertex AI, allowing organizations to begin testing the model in real workloads as Google expands the Gemini 3 series.

Speed and savings in the same package

Google has also detailed the pricing and performance improvements behind Flash-Lite’s design. The model is priced at $0.25 per one million input tokens and $1.50 per one million output tokens, a structure designed to keep costs manageable for applications that process requests at a large scale.

On the performance side, the company reports a 2.5x faster time to first token and 45% faster output speed compared with Gemini 2.5 Flash, helping applications deliver responses more quickly once a prompt is submitted.

Those improvements are particularly relevant for systems that handle continuous streams of prompts, such as automated moderation, large-scale translation, or other high-volume services, where even modest gains in response speed can accumulate across millions of interactions.

Advertisement

A closer look at the scorecard

Gemini 3.1 Flash-Lite also holds up well in industry benchmarks that test reasoning and multimodal understanding. The model recorded an Elo score of 1432 on the Arena.ai leaderboard, a ranking system that compares AI models based on head-to-head performance.

In academic-style evaluations, Flash-Lite scored 86.9% on GPQA Diamond, a benchmark focused on complex reasoning questions, and 76.8% on MMMU-Pro, which measures how well models interpret and reason across text, images, and other media.

According to Google, those results place Flash-Lite ahead of several models in the same category and even above some larger Gemini models from earlier generations.

Flash-Lite begins its real-world tests

Google is also giving developers more control over how the model approaches different tasks. Gemini 3.1 Flash-Lite introduces adjustable “thinking levels,” so teams can tune how much reasoning the system applies before generating a response.

Early access partners have already begun testing the model in production-style environments. Companies including Latitude, Cartwheel, and Whering are experimenting with Flash-Lite in their applications, with developers highlighting consistent structured outputs and reliable instruction-following.

In one example, Whering reported 100% consistency in item tagging when using the model for product classification. Another early tester said Flash-Lite delivered sub-10-second completions with near-instant streaming and roughly 97% structured output compliance during initial deployments.

With preview access now underway, Google is inviting developers to begin experimenting with Flash-Lite at scale.

Still deciding between Gemini and ChatGPT? Our hands-on comparison highlights seven differences that shape the experience.

Liz Ticong

Liz Ticong is a tech industry expert with hands-on experience in AI, software testing, and product analysis. Specializing in AI news, software reviews, and buyer’s guides, she rigorously tests and experiments with the latest AI and tech tools to provide in-depth, practical insights. As a contributor to eWeek and TechRepublic, she simplifies complex topics, helping readers make well-informed decisions.

eWeek Logo

eWeek has the latest technology news and analysis, buying guides, and product reviews for IT professionals and technology buyers. The site's focus is on innovative solutions and covering in-depth technical content. eWeek stays on the cutting edge of technology news and IT trends through interviews and expert analysis. Gain insight from top innovators and thought leaders in the fields of IT, business, enterprise software, startups, and more.

Eigentum von TechnologyAdvice. © 2026 TechnologyAdvice. Alle Rechte vorbehalten

Werbetreibenden-Offenlegung: Einige der auf dieser Website erscheinenden Produkte stammen von Unternehmen, von denen TechnologyAdvice eine Vergütung erhält. Diese Vergütung kann beeinflussen, wie und wo Produkte auf dieser Website erscheinen, einschließlich beispielsweise der Reihenfolge, in der sie erscheinen. TechnologyAdvice schließt nicht alle Unternehmen oder alle auf dem Marktplatz verfügbaren Produkttypen ein.