OpenAI Introduces Realtime Voice AI for Translation, Transcription, and Task Handling

OpenAI Introduces Realtime Voice AI for Translation, Transcription, and Task Handling

smartphone displaying ai assistant interface

Image: Zulfugar Karimov/Unsplash

Verfasst von
Aminu Abdullahi
Aminu Abdullahi
May 11, 2026
3 minute read
eWeek Inhalte und Produktempfehlungen sind redaktionell unabhängig. Wir können Geld verdienen, wenn Sie auf Links zu unseren Partnern klicken. Mehr erfahren

AI voice agents are getting closer to doing more than waiting their turn to speak.

OpenAI announced Thursday that it is expanding its Realtime API with GPT-Realtime-2, a new voice model the company says brings “GPT-5-class reasoning” to live conversations. The release is aimed at developers building AI agents that can listen, respond, translate, transcribe, and take actions while a conversation is still unfolding.

According to OpenAI, the goal is to move “from simple call-and-response toward voice interfaces that can actually do work: listen, reason, translate, transcribe, and take action as a conversation unfolds.”

The launch signals a shift in how developers build AI voice agents. Instead of stitching together separate tools for hearing, thinking, and speaking, GPT-Realtime-2 handles the entire process within a single loop. This allows for lower latency and more nuanced interactions, such as adjusting tone based on the user’s emotions.

Specialized tools for translation and text

Alongside its main reasoning model, OpenAI introduced GPT-Realtime-Translate and GPT-Realtime-Whisper to handle specific high-speed tasks. The translation model is built to keep up with a speaker, supporting more than 70 input languages and 13 output languages. This feature is aimed at industries like education and customer service, where live, cross-language communication is vital.

The third addition, GPT-Realtime-Whisper, focuses on “streaming speech-to-text.” According to OpenAI, this model “transcribes audio as people speak, so live products can feel faster, more responsive, and more natural—from captions that appear in the moment, to meeting notes that keep up with the conversation.”

Real-world applications and enterprise use

Several major companies have already begun integrating these models into their platforms. Early testers include the real estate marketplace Zillow, travel site Priceline, and European carrier Deutsche Telekom.

For example, Zillow is developing an assistant that can reason through complex voice searches, such as finding homes within a specific budget, avoiding busy streets, and scheduling weekend tours. Meanwhile, Vimeo is using the translation model to provide live updates for product education videos in multiple languages.

To support these more complex agentic workflows, OpenAI has expanded the context window for its flagship voice model from 32K to 128K tokens. This allows the AI to remember much more of a conversation, a significant change for engineers who previously had to build manual resets to keep voice sessions from failing.

Advertisement

Pricing and safety guardrails

The new models come with a tiered pricing structure. GPT-Realtime-2 is priced at $32 per million audio input tokens and $64 per million output tokens. The specialized models are billed by the minute, with GPT-Realtime-Translate costing $0.034 per minute and GPT-Realtime-Whisper priced at $0.017 per minute.

OpenAI has also implemented safety measures to prevent the tools from being used for fraud or spam. The company has embedded triggers so that “conversations can be halted if they are detected as violating our harmful content guidelines.” The Realtime API also supports EU Data Residency for applications based in Europe and follows the company’s existing enterprise privacy commitments.

Developers can begin testing the new models immediately through the OpenAI Playground or the company’s Codex platform.

Related reading: For more on OpenAI’s latest model updates, read our coverage of GPT-5.5 Instant becoming ChatGPT’s default model.

Aminu Abdullahi

Aminu Abdullahi is an experienced B2B technology and finance writer and award-winning public speaker. He is the co-author of the e-book, The Ultimate Creativity Playbook, and has written for various publications, including TechRepublic, eWEEK, Enterprise Networking Planet, eSecurity Planet, CIO Insight, Enterprise Storage Forum, IT Business Edge, Webopedia, Software Pundit, Geekflare and more.

eWeek Logo

eWeek has the latest technology news and analysis, buying guides, and product reviews for IT professionals and technology buyers. The site's focus is on innovative solutions and covering in-depth technical content. eWeek stays on the cutting edge of technology news and IT trends through interviews and expert analysis. Gain insight from top innovators and thought leaders in the fields of IT, business, enterprise software, startups, and more.

Eigentum von TechnologyAdvice. © 2026 TechnologyAdvice. Alle Rechte vorbehalten

Werbetreibenden-Offenlegung: Einige der auf dieser Website erscheinenden Produkte stammen von Unternehmen, von denen TechnologyAdvice eine Vergütung erhält. Diese Vergütung kann beeinflussen, wie und wo Produkte auf dieser Website erscheinen, einschließlich beispielsweise der Reihenfolge, in der sie erscheinen. TechnologyAdvice schließt nicht alle Unternehmen oder alle auf dem Marktplatz verfügbaren Produkttypen ein.