This $800M Startup Makes ChatGPT 24x Faster

This $800M Startup Makes ChatGPT 24x Faster

The Neuron featured image for vLLM

Image: The Neuron

Verfasst von
Grant Harvey
Grant Harvey
Jan 23, 2026
2 minute read
eWeek Inhalte und Produktempfehlungen sind redaktionell unabhängig. Wir können Geld verdienen, wenn Sie auf Links zu unseren Partnern klicken. Mehr erfahren

Every time ChatGPT takes three seconds to respond instead of 30, there’s probably infrastructure like vLLM working behind the scenes. 

You’ve been using it without knowing it. And now, the team behind it just became an $800 million company overnight.

Here are the details

Today, Inferact launched with a massive $150 million seed round to commercialize the open-source inference engine that’s already powering AI at Amazon, major cloud providers, and thousands of developers worldwide. Andreessen Horowitz and Lightspeed led the round, with participation from Sequoia, Databricks, and others.

What actually is vLLM? Think of it as the difference between a traffic jam and an AI highway system. When you ask ChatGPT a question, your request goes through an “inference” process — the model generates your answer, word by word. vLLM makes that process drastically faster and cheaper through two key innovations:

PagedAttention: Manages memory like your computer handles RAM, cutting waste by up to 24x compared to traditional methods.

Continuous batching: Instead of processing one request at a time, vLLM handles multiple requests simultaneously, like a restaurant serving 10 tables at once instead of waiting for each person to finish before seating the next.

Companies using vLLM report inference speeds 2-24x faster than standard implementations, with dramatically lower costs. The project has attracted over 2,000 code contributors since launching in 2023 from UC Berkeley’s Sky Computing Lab.

vLLM X post about startup raise for AI.
Image: X

Image: X

Why this matters

AI is shifting from a training problem to a deployment problem.

Building a smart model is no longer the bottleneck (all the main models are good), running it affordably at scale is. As companies move from experimenting with ChatGPT to deploying AI across millions of users this year, inference optimization becomes the difference between profit and bankruptcy.

Expect every major AI company to obsess over inference economics in 2026. The winners won’t necessarily be the smartest models, but the ones that can serve predictions fast enough and cheaply enough to actually make money.

For you: If your company is evaluating AI tools, ask vendors about their inference infrastructure. Tools built on engines like vLLM will scale more cost-effectively than proprietary solutions that haven’t solved this problem. The open-source advantage here is real… and now, venture-backed.

Editor’s note: This content originally ran in the newsletter of our sister publication, The Neuron. To read more from The Neuron, sign up for its newsletter here.

Grant Harvey

Grant Harvey is the Lead Writer of The Neuron, where he continues to lead the publication's daily coverage of AI news, tools, and trends.

eWeek Logo

eWeek has the latest technology news and analysis, buying guides, and product reviews for IT professionals and technology buyers. The site's focus is on innovative solutions and covering in-depth technical content. eWeek stays on the cutting edge of technology news and IT trends through interviews and expert analysis. Gain insight from top innovators and thought leaders in the fields of IT, business, enterprise software, startups, and more.

Eigentum von TechnologyAdvice. © 2026 TechnologyAdvice. Alle Rechte vorbehalten

Werbetreibenden-Offenlegung: Einige der auf dieser Website erscheinenden Produkte stammen von Unternehmen, von denen TechnologyAdvice eine Vergütung erhält. Diese Vergütung kann beeinflussen, wie und wo Produkte auf dieser Website erscheinen, einschließlich beispielsweise der Reihenfolge, in der sie erscheinen. TechnologyAdvice schließt nicht alle Unternehmen oder alle auf dem Marktplatz verfügbaren Produkttypen ein.