Related ToolsD IdCamtasia

Netflix Open-Sources VOID, an AI Model That Erases Objects from Video

AI news: Netflix Open-Sources VOID, an AI Model That Erases Objects from Video

Netflix has released its first publicly available AI model, and it's not a recommendation algorithm. VOID (Video Object and Interaction Deletion) is an open-source model that removes objects from video footage while also erasing the physical effects those objects caused - shadows, reflections, and even collisions.

That last part is what makes this interesting. Existing video inpainting (filling in removed areas with plausible background) tools can erase a ball from a scene. VOID can erase the ball and undo the fact that it knocked something over, restoring the scene as if the ball never existed. Netflix calls these "physically consistent counterfactual outcomes," which is a fancy way of saying the model reasons about cause and effect, not just pixels.

How It Works

VOID uses a two-stage approach. First, a vision-language model (an AI that understands both images and text) identifies which parts of the scene are causally affected by the object being removed. Then a video diffusion model generates the replacement footage. An optional second pass uses a technique called flow-warped noise to clean up any morphing artifacts where object shapes shift unnaturally between frames.

The model was built by researchers at Netflix and INSAIT (Sofia University) on top of Alibaba's CogVideoX-Fun architecture, a 5-billion parameter base model. It handles up to 197 frames at 384x672 resolution.

Practical Limits

This is firmly a research release, not a production tool. You'll need a GPU with 40GB+ of VRAM (an NVIDIA A100 or equivalent) to run it - hardware that costs thousands of dollars. The resolution cap means you're not processing 4K footage. And Netflix themselves describe it as research-oriented.

But for VFX and post-production teams, even a proof-of-concept like this points somewhere useful. Manual frame-by-frame object removal is one of the most tedious tasks in video editing. A model that handles the physics of removal, not just the visual fill, would save significant cleanup time.

The code is on GitHub under the Netflix organization, and model weights are available on Hugging Face. Anyone with the hardware can try it today.