
As expected after days of online leaks and rumors, Google has Veo 3.1 unveiledIts latest AI video generation model is bringing a suite of creative and technical upgrades aimed at improving narrative control, audio integration and realism in AI-generated videos.
While the updates expand the possibilities for hobbyists and content creators using Google’s online AI creation app, FlowThis release also signals a growing opportunity for enterprises, developers and creative teams wanting scalable, customizable video tools.
The quality is better, the physics are better, the price is the same as before, and the controls and editing features are more robust and diverse.
My preliminary test This showed it to be a powerful and performant model that instantly delights every generation. However, the look is more cinematic, polished and a little more "artificial" By default compared to rivals like OpenAI’s new Sora 2, which was released late last month, which may or may not be what any particular users are chasing (Sora is excellent at handheld and "outspoken" Style Video).
Extended control over narrative and audio
Veo 3.1 is based on its predecessor Veo 3 (Released back in May 2025) with advanced support for dialogue, ambient sounds and other audio effects.
Native audio generation is now available across several key features in Flow, including “Frames to Video,” “Ingredients to Video,” and “Extend.”" Which provide users with the ability to, respectively: convert still images to video; Use items, characters and objects from multiple images in a single video; And generate clips longer than the initial 8 seconds, longer than 30 seconds, or longer than 1+ when continuing from the last frame of a prior clip.
Previously, you had to manually add audio after using these features.
This combination gives users greater control over tone, emotion, and storytelling – capabilities that previously required post-production work.
In enterprise contexts, this level of control can reduce the need for separate audio pipelines, providing a unified way to create training content, marketing videos or digital experiences with synchronized sound and visuals.
Google noted a blog post This update reflects user feedback for deeper artistic controls and improved audio support. Gallegos emphasizes the importance of making editing and refinement possible directly in Flow without having to reshoot scenes.
Better input and editing capabilities
With Veo 3.1, Google introduces support for multiple input types and more detailed control over the generated output. The model accepts text prompts, images, and video clips as input, and also supports:
-
Reference images (up to three) To guide the appearance and style in the final output
-
first and last frame interpolation To generate seamless visualization between fixed endpoints
-
visual detail One that continues the action or movement of a video beyond its current duration
These tools are intended to provide enterprise users with a way to improve the look and feel of their content – useful for adhering to brand consistency or a creative brief.
Additional capabilities such as “Insert” (add objects to scenes) and “Remove” (remove elements or characters) are also being introduced, although not all are immediately available through the Gemini API.
Deploy on all platforms
Veo 3.1 is accessible through several of Google’s existing AI services:
-
FlowGoogle’s own interface for AI-assisted filmmaking
-
gemini apiTargeted at developers building video capabilities into applications
-
Vertex AIWhere enterprise integration will soon support VO’s “Scene Extensions” and other key features
Availability through these platforms allows enterprise customers to choose the right environment—GUI-based or programmatic—based on their teams and workflows.
Pricing and Access
Veo 3.1 model is currently available Preview and is only available on payment level Of Gemini API. The cost structure is similar to Veo 3, the previous generation of Google’s AI video model.
-
standard model:$0.40 per second of video
-
fast model: $0.15 per second
There is no free tier, and users are only charged if a video is successfully generated. This model is consistent with previous VO versions and offers predictable pricing for budget-conscious enterprise teams.
Technical details and output control
Veo outputs video at 3.1 720p or 1080p resolutionwith 24 fps frame rate,
Duration options included 4, 6, or 8 seconds From text prompts or uploaded images, with the ability to extend the video 148 seconds (over two and a half minutes!) When using the “Extend” feature.
The new functionality also includes tighter control over themes and environments. For example, enterprises can upload a product image or visual reference, and Veo 3.1 will generate visuals that preserve its appearance and stylistic cues throughout the video. It can streamline creative production pipelines for retail, advertising, and virtual content production teams.
initial reactions
The broader creator and developer community has responded to the launch of VO 3.1 with a mix of optimism and tempered criticism – especially when compared to rival models like OpenAI’s Sora 2.
Matt Schumer, An AI founder and early adopter of Otherside AI/HyperWrite described his initial reaction as “disappointment”, noting that VO 3.1 is “significantly worse than Sora 2” and also “significantly more expensive”.
However, he acknowledged that Google’s tooling – like support for context and visual extensions – is a bright spot in the release.
travis davidsa 3D digital artist and AI content creator, echoed the same sentiment. While he noted improvements in audio quality, particularly in sound effects and dialogue, he raised concerns about the limitations present in the system.
These include the lack of custom voice support, the inability to directly select the generated voices, and a continued limit on 8-second generations – despite some public claims about longer outputs.
Davids also pointed out that character stability in changing camera angles still requires careful prompting, while other models like Sora 2 handle this more automatically. He questioned the absence of 1080p resolution for users at paid tiers such as Flow Pro and expressed doubt over feature parity.
On a more positive end, @kimonismus, An AI newsletter writer said that “Veo 3.1 is amazing,” though still concluded that OpenAI’s latest model remains better overall.
Collectively, these early impressions suggest that while Veo 3.1 offers meaningful tooling enhancements and new creative control features, expectations have changed as competitors raise the bar on both quality and usability.
Adoption and scale
Since launching Flow five months ago, Google says it’s finished 275 million videos Designed in various Veo models.
The pace of adoption suggests significant interest not only from individuals but also from developers and businesses experimenting with automated content creation.
Thomas Ilzik, director of product management at Google Labs, highlighted that the release of Veo 3.1 brings the abilities of human filmmakers closer to planning and shooting. These include visual composition, continuity across shots, and coordinated audio – all areas that enterprises are increasingly looking to automate or streamline.
Security and responsible AI use
Videos created with Veo 3.1 watermarked using Google synthID Technology, which embeds an imperceptible identifier to indicate that content is AI-generated.
Google applies security filters and moderation to its APIs to help reduce privacy and copyright risks. The generated content is stored temporarily and deleted after two days until downloaded.
For developers and enterprises, these features provide assurance about provenance and compliance – critical in regulated or brand-sensitive industries.
Where Veo 3.1 stands out among the crowded AI video model space
Veo 3.1 isn’t just an iteration of previous models – it represents a deeper integration of multimodal input, storytelling controls, and enterprise-level tooling. While creative professionals may find immediate benefits in improving workflow and edit fidelity, businesses looking for automation in training, advertising or virtual experiences may find even more value in the model’s structure and API support.
Early user feedback highlights that while VO 3.1 offers valuable tooling, expectations around realism, voice control and generation length are rapidly evolving. As Google expands reach through Vertex AI and continues to refine VO, its competitive position in enterprise video generation will depend on how quickly these user problems are addressed.

