Google Digs Into Generative AI for New Audio, Video, and Images

Veo is the newest model for video and Imagen 3 is for text-to-image

By

Published on May 14, 2024 03:40PM EDT

Soon, a simple voice or text prompt will be all you need to create the perfect video, image, or song.

The Google I/O Developer conference featured a ton of new capabilities rolling on to users today and in the coming weeks and months. Among those are updates to how you create videos, images, and music using generative AI.

The Google team has been working on improving generative AI, and among the advances they've made (including things like AI in picture search that's pretty stinkin' cool), they've included some new video creation capabilities using an agent called Veo, new text-to-image updates for Imagen 3, and some new capabilities in Google's Music AI Sandbox.

Wycliff Jean working with Google's Music AI Sandbox.
Google

First up is Veo, the new generative AI agent that can help you create 1080p videos using text, image, or voice prompts. Veo will match the video that can be made to match the style of a photo used and can draw from some new tools to build the video you imagine. For example, Veo can now understand terms like "time-lapse," "tracking shot," or "aerial shots" to better create the frames you want.

"With Veo, we’ve improved techniques for how the model learns to understand what's in a video, renders high-definition images, simulates the physics of our world and more. These learnings will fuel advances across our AI research and enable us to build even more useful products that help people interact and communicate in new ways," said Eli Collins, VP, Product Management, and Douglas Eck, Senior Research Director, in Google's post announcing the new capabilities.

Imagen 3 is the newest image module Google announced during the conference. Google says it "better understands natural language, the intent behind your prompt and incorporates small details from longer prompts." The result is that you get better, more realistic images with more detail than with previous models. Imagen 3 is also supposed to handle text art better than previous versions, letting you create personalized messages and other text decorations.

An image of a robot holding a bird, created by Imagen 3. — Imagen 3 can create more detail than previous text-to-image AI models.
Google

Finally, Google spent some time focused on its Music AI Sandbox, which allows musicians to create music using AI. New additions include instrumental additions and more ways to express creative styles. Google has partnered with musicians such as Wyclef Jean, Marc Rebillit, and Justin Tranter to continue developing Music AI Sandbox and expanding the capabilities of Gemini AI for creating music.

All of these features are in some phase of testing. Veo and Imagen 3 are available to select creators through VideoFX (if you're not already a part of VideoFX, you'll have to get on the waitlist), while Music AI Sandbox is still limited to only the artists Google has asked to participate.

The Best Drawing Tablets of 2024

Was this page helpful?

Thanks for letting us know!

Tell us why!