
Follow ZDNET: Add us as a favorite source On Google.
Key takeaways of zdnet
- Openai’s reality API is now customized and generally available.
- You can try its latest speech-to-speech model, Got-ritime.
- The voice of Openai for upgrade developers improve the offerings.
This year, AI agents who can carry out tasks on behalf of users have focused a major focus, with companies constantly developing offerings that reduce the user’s workload. To make these interactions as comfortable as possible, many companies are bending on multimodal AI agents, and Openai is making these products even easier.
Also: 3 smart ways business leaders can create successful AI strategies – before it is too late.
According to the company, Openai updated its realtime API, which is now usually available, on Thursday, with new features, which allow developers and enterprises to create more reliable voice agents. Openai first launched a realtime API in Public Beta in October 2024. Additionally, the company has so far released its most advanced speech-to-speech model, called GPT-Ritime.
Release:
Realtime API Update
- What: According to the release of realtime API upgrade, the remote includes support for phone calling through MCP server, image input and phone calling. During a livestream for the declaration, Openai mentioned that MCP is well suited to Voice Command, which enables users to take original action with connected apps.
- why it matters: Ultimately, these extended capabilities must be able to reach more equipment for voice agents and have more reference to the help of users. AI devices are only as helpful as the information they give, so streamlining the process of connecting the AI ​​model with data sources is a major victory for developers and users. Most importantly, the MCP Open-Standard ensures that the connections are made, users prefer data and privacy.
A new speech-to-spich model
- What: Openai postponed its new GPT-Realtime model as the company’s “most advanced, production-taiyar voice model”. Upgrades include improving intelligence, complex instructions the following and function calling. It can also switch languages ​​in the middle of a sentence.
-
A demo of the model showed how a human -like model is, which is fulfilled with the divides representing a wide range of emotions. The model, which was stress-tested on various evaluation, also appeared to successfully follow the instructions-an OpenII employee followed a jailbreak attempt contrary to the system prompt, but GPT-ritime rejected peacefully and did not bow down to the efforts. He also analyzed a picture and talked about what it was watching.
-
Openai also added two new sounds, cedar and marin, which are particularly available in APIs.
-
why it matters: Supporting voice aid and a major principle of interactions are models that make natural sounds and can actually help with tasks. If the new model works as a claim, it will enable a better experience for users.