OpenAI Introduces Realtime Voice AI for Translation, Transcription, and Task Handling

smartphone displaying ai assistant interface

Image: Zulfugar Karimov/Unsplash

Verfasst von

May 11, 2026

3 minute read

eWeek Inhalte und Produktempfehlungen sind redaktionell unabhängig. Wir können Geld verdienen, wenn Sie auf Links zu unseren Partnern klicken. Mehr erfahren

AI voice agents are getting closer to doing more than waiting their turn to speak.

OpenAI announced Thursday that it is expanding its Realtime API with GPT-Realtime-2, a new voice model the company says brings “GPT-5-class reasoning” to live conversations. The release is aimed at developers building AI agents that can listen, respond, translate, transcribe, and take actions while a conversation is still unfolding.

According to OpenAI, the goal is to move “from simple call-and-response toward voice interfaces that can actually do work: listen, reason, translate, transcribe, and take action as a conversation unfolds.”

The launch signals a shift in how developers build AI voice agents. Instead of stitching together separate tools for hearing, thinking, and speaking, GPT-Realtime-2 handles the entire process within a single loop. This allows for lower latency and more nuanced interactions, such as adjusting tone based on the user’s emotions.

Specialized tools for translation and text

Alongside its main reasoning model, OpenAI introduced GPT-Realtime-Translate and GPT-Realtime-Whisper to handle specific high-speed tasks. The translation model is built to keep up with a speaker, supporting more than 70 input languages and 13 output languages. This feature is aimed at industries like education and customer service, where live, cross-language communication is vital.

The third addition, GPT-Realtime-Whisper, focuses on “streaming speech-to-text.” According to OpenAI, this model “transcribes audio as people speak, so live products can feel faster, more responsive, and more natural—from captions that appear in the moment, to meeting notes that keep up with the conversation.”

Real-world applications and enterprise use

Several major companies have already begun integrating these models into their platforms. Early testers include the real estate marketplace Zillow, travel site Priceline, and European carrier Deutsche Telekom.

For example, Zillow is developing an assistant that can reason through complex voice searches, such as finding homes within a specific budget, avoiding busy streets, and scheduling weekend tours. Meanwhile, Vimeo is using the translation model to provide live updates for product education videos in multiple languages.

To support these more complex agentic workflows, OpenAI has expanded the context window for its flagship voice model from 32K to 128K tokens. This allows the AI to remember much more of a conversation, a significant change for engineers who previously had to build manual resets to keep voice sessions from failing.

Pricing and safety guardrails

The new models come with a tiered pricing structure. GPT-Realtime-2 is priced at $32 per million audio input tokens and $64 per million output tokens. The specialized models are billed by the minute, with GPT-Realtime-Translate costing $0.034 per minute and GPT-Realtime-Whisper priced at $0.017 per minute.

OpenAI has also implemented safety measures to prevent the tools from being used for fraud or spam. The company has embedded triggers so that “conversations can be halted if they are detected as violating our harmful content guidelines.” The Realtime API also supports EU Data Residency for applications based in Europe and follows the company’s existing enterprise privacy commitments.

Developers can begin testing the new models immediately through the OpenAI Playground or the company’s Codex platform.

Related reading: For more on OpenAI’s latest model updates, read our coverage of GPT-5.5 Instant becoming ChatGPT’s default model.

Aminu Abdullahi

Aminu Abdullahi is a B2C and B2B technology and finance writer with more than six years of experience covering enterprise IT, cybersecurity, cloud computing, artificial intelligence, fintech, business software, and emerging technologies. His work has appeared in publications including TechRepublic, eWEEK, Channel Insider, Geekflare, Enterprise Networking Planet, eSecurity Planet, CIO Insight, and Webopedia. With a technical background in computer science, he specializes in translating complex technology topics into clear, accessible content for business leaders and decision-makers.

Bleiben Sie den wichtigen technologischen Entwicklungen immer einen Schritt voraus - von den heutigen bahnbrechenden Innovationen bis hin zu den Innovationen von morgen.

Erhalten Sie täglich kostenlos eine E-Mail mit den wichtigsten Tech-Trends und Einblicken für Führungskräfte in der Tech-Branche.

OpenAI Introduces Realtime Voice AI for Translation, Transcription, and Task Handling

Specialized tools for translation and text

Real-world applications and enterprise use

Pricing and safety guardrails

Aminu Abdullahi

Unternehmen

Kategorien