
The AI-Janit video is moving rapidly, racing to create and commercialize your own models with leading technical developers. Now we are looking at the rise of devices that can generate a strike photorolic video from a sign in the natural language. For most parts, however, there has been a great drawback in the AI-related video: it is silent.
No more. At its annual I/O Developer Conference on Tuesday, Google announced the release of the latest recurrence of its video-generating AI model, the VEO 3, which also comes with the ability to generate synchronized audio.
Too: Everything declared in Google I/O 2025: Gemini, Search, Android XR, and more
For example, you indicate the system to generate a video set inside a busy metro car. The Veo 3 can produce videos, as well as to connect the feeling of realism with the AI-rich environment background noise. You can also indicate it to generate audio of human voices, according to Google.
This model also specializes in simulating real-world physics and lip-syncing, leading to a potentially valuable tool for filmmakers and leads Google’s comprehensive mission to bring a comprehensive mission of bringing AIs usable in creative industries. It is now available to Mithun Ultra customers in the US. It can also be accessed through the flow, Google’s new AI-manufactured film production tool, which was also unveiled in I/O this week.
A major technical challenge
The VO3 represents one of the first models from a major tech developer that can synchronize AI-Janit Video and Audio. Meta Movie geneReleased in October, another. Some other equipment, such as the runway’s gene-3 alpha, comes with the characteristics that enable AI-borne audio into the video in the post-production process, but the concurrent generation of both require major forces calculations and resources such as Google.
Too: Google I/O 2025 8 Best AI features and equipment surfaced
The creation of AI models capable of generating synchronized videos and audio is a thorny technical challenge and an active field of research in the AI industry. Both AI-rented video and AI-related audio are different technical challenges, and fuses introduce a new dimension of complexity. There is a demo of Veo 3 here.
https://www.youtube.com/watch?v=94kmlfyiao88
For one thing, the video is still a series of frames, while the audio is a continuous wave. Therefore, both of them require models that can operate in these two forms, in which they account for different timescles in which they work.
Too: Google Flow is a new AI video generator who is for filmmakers – how to try it today
An AI model fusing video with sound should also be able to account dynamically for variables such as material, distance and motion. A car moving at a speed of 100 mph looks at a speed of 10 mph. A horse running on coblastone seems different from one who walks on the grass.
Get top stories of morning with us in your inbox every day Tech Today Newsletter.