
Not long ago, generic AI could only communicate with human users through the text. Now it is being given the power of rapid speech – and this capacity is improving in the day.
On Thursday, AI Voice Platform XIBabs Pur: V3 is described on the company’s website as “the most expressive text-to-speech model” on the company. The new model can demonstrate a wide range of emotions and subtle communicable quirks – such as ah, laughter, and whispering – is making its speech more human form than the company’s previous model.
Also: Can WWDC Apple have an AI turning point? Here are predicting analysts
One in Demo Shared on X, V3 was shown to generate two characters, one male and the other woman, who were making a conversation about their new ability to speak in more human voices.
Introduction to eleven V3 (alpha) – the most expressive text for speech models.
70+ languages support, multi-spicker dialogue, and audio tags such as (excited), (ah), (laughing), and (whisper).
Now in public alpha and 80% discount in June. pic.twitter.com/n56Bersduc– Elevenlabs (@levenlabsio) June 5, 2025
Certainly the tone is none of the Alexa-esk flatness of the tone, but the V 3-related sounds are almost highly animated, at the point that their laughter is more scary than attractive- Listen to yourself,
The model can also speak more than 70 languages compared to 29 limits of its predecessor 29. It is now available in public alpha, and its value tag has fallen by 80% by the end of this month.
Future of AI Interaction
The AI-made voice has become a major focus of innovation as tech developers look at the future of human-masine interactions.
Automatic accessories such as Siri and Alexa have been able to speak for a long time, of course, but whoever regularly uses these systems, they can attach, their voices are very mechanical, but with a narrow range of emotional rhythm and tone. They are useful for handling quick and easy tasks, such as playing a song or setting an alarm, but they do not make great conversations partners.
On the other hand, some latest text-to-spect (TTS) AI Tools, have been engineered to speak in voices that are maximum realistic and attractive.
Also: You should not rely on AI for Therapy – why is it here
Users can indicate V3, for example, to speak in voices that are easily adapable through the use of “audio tags”. Think of them as stylistic filters that modify the output, and which can be put directly into the text: “excited,” loud, “” sings, “” laughing, “” anger, “and so on.
Elevenlabs is not the only company racing to create more lifestyle TTS models, selling large tech companies as a more comfortable and accessible way to interact with AI.
In late May, Elevenlabs competitive Hume AI unveiled its sympathetic voice interface (EVI) 3 models, which allows users to generate custom sounds by describing them in natural language. Similarly, Google’s Gemini 2.5 Pro flash model now has fine interactive abilities.
Want more stories about AI? Sign up for innovationOur weekly newspapers.