Google just shipped a new visual mode for Gemini: ask it a question, and instead of text or a static image, you might get a 3D model you can rotate or a simulation with adjustable sliders.
The mechanics are straightforward. Inside the chat interface, Gemini can now render 3D objects that respond to mouse input - you can spin them, examine them from different angles. For simulations, you can change underlying parameters and watch the output update in real time. The whole interaction happens inside the conversation window, no external tool required.
This kind of response is called multimodal - meaning it combines different types of output (text, images, interactive graphics) rather than staying in any single format. That's been the theoretical promise of AI assistants for years. Most have stayed in text-plus-image territory. Gemini is now pushing into interactive territory.
The obvious use cases are education and design. A student asking how a pendulum behaves under different gravity conditions could actually test it. A product designer exploring a component could rotate it before heading to CAD software. Whether the generations are accurate enough to trust for anything technical is the real test - a wrong but interactive model is worse than a correct static diagram.
ChatGPT and Claude don't currently offer comparable in-chat interactivity, which gives Google a real point of differentiation here - assuming the quality holds up well enough to make it genuinely useful rather than a showcase feature.
Google hasn't specified which Gemini plans include the feature or given a complete rollout timeline.