
If you have ever urged yourself to celebrate yourself with the AI version, now you can do – kind.
On Thursday, AI Start-up Hume announced the launch of a new “hyperlistic voice cloning”, Speciality For the latest recurrence of its Empathic Voice Interface (EVI) model, EVI3, which was unveiled last month. The idea is that by uploading a small audio recording of speaking yourself-the middle-model of 30 and 90 seconds should quickly be able to churn the AI-renunciation of your voice quickly, which you can then interact orally, like you will stand in front with another person.
Also: Text-to-speech with felt-This new AI model does everything but sheds a tear
I uploaded a recording of my voice to EVI 3 and spent some time and chatted with a mimic of my voice model. I was expecting (perhaps with naive) experience the experience of a supernatural valley – which has a highly rare feeling of interacting with something that feels About Completely real, yet the off-aid feels a little uncomfortable to make enough-and was disappointed when the EV3m was more like its own audio cartoon version.
Let me do a little unpack.
EVI 3 using voice cloning feature
I copying my voice, in some ways, was undeniably realistic. It appears to stop while speaking more or more in the same way as I do, with a touch of the familiar vocal fry. But Mirring stopped there.
Hume has claimed in his blog post that EVI 3’s new voice cloning feature can capture “aspects of the personality of the speaker”. This is a vague promise (perhaps intentionally so), but in my own tests, the model was reduced in this regard. Away from feeling like a solid simulation of my own behavior quirks and a sense of humor, the model spoke with a chipper, curious-to-reproach tone, which was well suited for a radio advertisement for antidepressants. I like to think myself as friendly and generally excited, but AI was clearly exaggerating those special character symptoms.
Also: Fighting AI with AI, finance firms stopped $ 5 million in fraud – but at what cost?
Despite its usually a puppy-like demonor, the model was strange to try to speak in an pronunciation, which seemed to me that it would be the kind of fickle voice exercise he would excel. When I asked it to give it a round to an Australian pronunciation, he called “G’Day” and “friend” once or twice in his normal voice, then immediately got away from anything more courageous. And no matter what I had motivated to speak about it, it went to find some creative and round -round way to go back to the subject, when I recorded my voice as a sample to use, reminiscent of an experiment from anthropic last year in which Cloud was replaced to form. Passionate with Golden Gate Bridge,
For example, in my second test, I recorded myself while speaking about LED Zepelin, which I was listening to that morning before. When I then asked the voice clone of EV3 to clarify my ideas on the nature of dark matter, it quickly found a way to bring back its reaction to music, which mysteriously compared cosmos with abstract raga compared to invisible force that compared a song with meaning and strength.
You can try the new voice cloning feature of EVI 3 for yourself Here,
As a human WebsiteUser data produced from interaction with EVI API is collected and unnamed by default to train the company’s model. You can close it through “zero data retention” feature in your profile. For non-API products, including the above-linked demo, the company says it can “May” and use data to improve your model-but again, you can close it if you create a personal profile.
Whisper robot
AI’s voices have been quite a time, but they are historically limited in their realism; For example, it is very obvious that you are talking to a robot, when you receive reactions from Classic Siri or Alexa. Conversely, a new wave of AI voice models, EVI 3 among them, not only to speak in natural language, but also, and more importantly to mimic the subtle diversities, intimates, idiochesis and tables that affect real, everyday human speech.
Hume CEO and chief scientist Alan Cowen told me, “A large part of human communication is emphasizing the right words, stopping at the right time, using the right tone,” Hume’s CEO and Chief Scientist Allen Cowen told me.
As Hume wrote blog post On Thursday, EV3 “Knows what to emphasize the word, what makes people laugh, and how accents and other voice characteristics interact with vocabulary.” According to the company, it carries forward a major technical leap from the earlier speech-generating model, which lacks a meaningful understanding of the language. “
Many AI experts take umbrage with the use of words like “understanding” in this context as models like EVI 3 are trained only to detect and recreate patterns that shine with their voluntary health of training data, a procedure that certainly does not leave any room that we recognize as a true word.
Also: Chatgpt is not to chat right now – now it will work for you
According to Hume’s blog post, EV3 was “trained on the trillions of the text tokens and then millions of hours of speech. According to Cowen, this approach has enabled the model to speak alone in the sounds that are very realistic, which will be expected to be easily expected.” The model), with the voice (model), which is the most surprise that can be done by human (they), ” Said.
But on the one hand philosophical logic, the new wave of AI Voice Model is uncontrollably impressive. When indicated, they can explore a very spacious category of vocal expression than their predecessors. Companies such as Hume and XIlabes claim that these new models will have practical benefits for industries like entertainment and marketing, but some experts fear that they would open new doors for deception – as was illustrated only last week when an unknown person used AI to mimic the voice of the US Secretary’s Secretary Maro Rubio and later deployed Voice Clone.
“I don’t see any reason that we will need a robot,” Emily M. Bender, a linguist and Kothore AI ConRecently told me. “Like, what is that? Apart from dissolving the fact that what you are hearing is synthetic?”
Revolutionary routine becomes
Yes, the voice cloning feature of EV3, like all AI tools, has its shortcomings. But they are much higher than its remarkable qualities.
For one thing, we must remember that the generic AI models that kill the market today are part of the infancy of technology, and they will only continue to improve. In less than three years of time, we have gone from public release of Chatgpt to AI models that can reduce or more and more reassured real human voices and equipment such as Google’s VEO 3, which can produce realistic videos and synchronized audio. The breathtaking speed of generative AI progress should stop to say at least.
ATSO: AWS VP says
Today, EVI 3 can simulate a thick addition to your voice. However, it is not unfair to expect, that its successor-or perhaps will be able to catch a grand-sexer-your voice in a way that feel really confident. In such a world, anyone can imagine EVI or a similar voice-generating model, which is being associated with the AI agent, says, join the zoom meeting on their behalf. This too, less optimistic, a scam artist’s dream may come true.
Perhaps the most striking facts about my experience interacting with the voice cloning feature of EVI 3, however, it is, how this technique already feels.
As the pace of technological innovation is accelerated, we have to immediately normalize our ability, which would have shocked the previous generations of humans in strange silence. Sam Altman of Openai recently created this point in a blog post: According to Altman, we are contacting eccentricity, yet for most parts, it looks like business as usual.
Want more stories about AI? Sign up for innovationOur weekly newspapers.

