I used Google's Veo 3 to make AI ASMR Food Video

The Veo 3 AI video model of Google is a major reason – a league above any of its contestants for sound. You not only see on the screen, but you can also indicate what you hear.

Manufactured by Google’s Deepmind Lab, the first VO model began in May 2024, and each new generation has added more functionality. It always excels in speed accuracy and physics understanding compared to the contestants, but there was a game-changer in addition to sound.

You can use it a small commercial, a scene from a film, or even a music video for a music video. But there is an usage that I have seen more than any other – ASMR (autonomous sensory maridian reaction): They trigger gentle exploitation, whispers, and ambient sounds that trigger a tingling sensation for some people.

To see how far it can go, I created a series of ASMR Food Prompts – each designed to generate a matching video and some sound around some Pak.

Gemini logo

(Image Credit: Shutterstock)

VO3 in Gemini app

The VO3 is now available in the Mithun app. When starting a new prompt, just select the video option, type it, type it, and an 8-second clip is generated.

While Gemini is not necessarily the best way to reach VO3 – I would recommend Freepic, FAL, Higgsfield, or Google Flow – it is easy to use and work is done.

An important advantage of using Gemini directly is that it automatically explains and enhances your signals. So if you ask for “a cool ASMR video with” Lassagna characterized “, what will you get.

You can be more specific using something called structured prompting – to label each moment with a timestamp and visual details. But until you need accurate control, a simple paragraph (aka story signal) is usually more effective.

Indicate

The first task in any AI project is thinking about your signal. Models are getting better in explaining the intentions, but if you want what you want, it is still specific.

I knew that I wanted ASMR food video, so I started a test: “ASMR food video with sound.”

Result? decent. This essentially gave me the lacquer that was in my mind. Then I refined it – outlining specific food types, adding sound details, and even trying a structured signal for a fizzy drink with ice.

Most of the time, the story signal works best. Just describe what you want to see, the flow of the video, and how to come through the sound.

1. Sizzling from Lasagna PAN

Google Veo 3 Lasagne Video – YouTube
Google Veo 3 Lasagne Video - YouTube

See here

The first indication, “ASMR Food Video with Sound,” produced a stunning clip of a person who slipped a thorn in a slices of Lassagna. When you enter the fork, you listen to the squish, then it hits the plate as a clan. This is a case where I want VO3 to have a “Extended Clip” button.

It did not include any other signal, so I had no way to identify what food would be, how the sound would come out or whether the sound would work. This is why it is important to be specific when indicating the AI model, even in chatbots like Gemini.

2. Cooking and eating

Google Veo 3 Cooking Video – YouTube
Google Veo 3 Cooking Video - YouTube

See here

Next, I became more specific-a long, narrative-style sign asked VO3 to prepare satisfactory food in a well-light kitchen and generate close-ups of a chef to eat.

I asked for a slow view of the material, sliced, a crunch of a crunch in a pan, and a crunch in the form of chefs.

I also added this line: “Emphasize audio quality: Clean, Lear ASMR Soundscape Without Music” not only to direct the sound, but also the style of sound and what I do not want to hear.

3. Popcorn popping

Google Veo 3 Popcorn Video – YouTube
Google Veo 3 Popcorn Video - YouTube

See here

For the last sign, I started with an image. I used the Midjorney V7 to make a picture of a woman looking at rainbow popcorn, then added “ASMR Food” to Gemini.

Visually, the result was amazing – but for some reason, the woman says in a voiceover, “It is delicious, it is a rainbow popcorn.” This is on me – I did not specify that he should speak, or what he should say.

A simple fix: You want any speech in quotation. For example, I could inspire him to say that “I liked to see popcorn pop,” and emphasized the word pop. I could also specify that she was speaking on camera – and VO3 would have sinked the lip movement to match.

conclusion

Overall, the VO3 provides impressive results, especially when it comes to producing high quality sounds that accurately reflect the scenes. While there are some quirks to navigate, such as unexpected voiceover or slightly lower -looking lasagna – these are easily addressed with more specific indications.

More than Tom’s guide

Back to laptop

What's Hot

I tried 0patch as a last resort for my Windows 10 PC – here’s how it compares to its promises

A PC Expert Explains Why Don’t Use Your Router’s USB Port When These Options Are Present

New ‘Remote Labor Index’ shows AI fails 97% of the time in freelancer tasks

McDonald’s CEO predicts these 3 big food trends for 2026

Why did Google’s Sergey Brin call early retirement the ‘worst decision’?

Google’s search chief rejects this strategy of licensing news content amid AI controversy

Microsoft’s new text editor is a VIM and Nano option

The best luxury car for buyers for the first time in 2025

Massives Datenleck in Cloud-Spichenn | CSO online

Most Popular

Google tests AI-operated audio overview in search results for some questions

Yes, this was the original voice of the Garat in the trailer for the thief VR

Best LC10 loadout in call of duty: Warzone

Our Picks

I tried 0patch as a last resort for my Windows 10 PC – here’s how it compares to its promises

A PC Expert Explains Why Don’t Use Your Router’s USB Port When These Options Are Present

New ‘Remote Labor Index’ shows AI fails 97% of the time in freelancer tasks

Subscribe to Updates

What's Hot

I used Google’s Veo 3 to make AI ASMR Food Video

Related Posts

Subscribe to Updates