Models Notable

Google's Gemini Omni Accepts Any Input and Outputs Any Format

May 23, 2026 2 min read

Image: Google

Most "multimodal" AI models only go one direction: feed in an image, get back text. Google's new Gemini Omni changes that equation. It accepts text, images, audio, and video as input and can output all of those formats too â€” meaning you can, in theory, hand it a photo and get back a video, or describe a scene in text and receive a generated clip.

That's what "anything-to-anything" actually means in practice, and it's a meaningful gap from where most AI tools sit today. Models like GPT-4o handle mixed inputs but their native outputs skew heavily toward text and images. Gemini Omni's pitch is that every modality is a first-class citizen on both sides of the transaction.

What This Looks Like in Practice

The hands-on demo that surfaced publicly involved deepfaking a child's stuffed deer â€” making the plush toy appear to be on vacation in a series of AI-generated videos. That's a frivolous example, but it illustrates the underlying capability clearly: take a real object (the toy), give the model context (vacation), and get back coherent video output stitching those elements together.

The same pipeline is directly useful for marketers making product demo videos, content creators doing B-roll work, or small business owners who need visual assets without a production budget. You're not just generating images from text anymore; you're compositing across formats.

How It Compares

OpenAI's Sora handles text-to-video. Runway and Kling handle video-to-video and image-to-video. What Gemini Omni appears to be doing differently is collapsing those separate tools into a single model with a unified interface â€” you describe what you want and let the model figure out which transformation is needed.

Whether it executes those transformations as well as purpose-built tools is still an open question. Unified models often trade peak performance in any one modality for breadth across all of them. The deepfake deer test suggests the output quality is at least good enough to be convincing â€” not a bar that was easy to clear even 18 months ago.

Google has not announced standalone pricing for Gemini Omni yet. Access appears to be rolling out through the existing Gemini platform.

What This Looks Like in Practice

How It Compares

Related Tools

More from today

NVIDIA's Nemotron-Labs Bets on Diffusion Models to Break Text Generation's Speed Ceiling

Google's AI Moderation Deleted an Artist's Entire Account - Gmail, Drive, Everything

Microsoft Says AI Agents Can Cost More to Run Than Paying a Human Employee

Cookie Preferences