Related ToolsCursor

H Company's Holo3.1 Cuts Computer-Use Agent Step Time to 3.3 Seconds

Holo3.1: Fast & Local Computer Use Agents
Image: Hugging Face

3.3 seconds. That's the average step time for Holo3.1 when an AI agent controls a computer - down from 6.8 seconds in the previous Holo3 release. H Company published the new model family on June 2, 2026, with four sizes designed to run locally on hardware you already own.

Computer use agents are AI systems that operate software directly: clicking buttons, filling forms, navigating browsers, and using desktop applications. Unlike API-based automation, they work with any interface a human can use - no custom integrations required.

Four Sizes, One Benchmark Story

The Holo3.1 family runs from 0.8B (ultra-lightweight) through 4B and 9B to a 35B-A3B flagship. All are built on the Qwen base architecture. On AndroidWorld - a benchmark measuring how well an agent completes real tasks on an Android phone - the 35B model scores 79.3%, up from 67% in Holo3. The 4B model hits 72%, up from 58%. A 14-point jump on a smaller model is meaningful: it's the difference between the agent reliably finishing a task versus getting stuck roughly once every seven attempts.

Speed on Consumer Hardware

The latency improvements come from three quantized formats - compressed model versions that trade a small amount of accuracy for faster processing:

  • NVFP4: Optimized for NVIDIA GPUs, 1.74x faster than BF16 (the standard full-precision format)
  • FP8: A middle ground between precision and speed
  • Q4 GGUF: The format used by tools like Ollama and llama.cpp, enabling local deployment on Windows and Mac, including Apple Silicon

The 2x end-to-end speedup combines NVFP4 precision with changes to the agent harness - the software scaffolding that manages how the model chooses actions.

Holo3.1 also adds native function-calling support, outputting structured JSON that agent frameworks can consume directly without a separate parsing layer. For developers chaining computer use with other tools in an automation pipeline, that removes an integration step.

All model weights and quantized checkpoints are on Hugging Face. H Company also offers a cloud API at hcompany.ai for teams that prefer managed inference. The 4B model running locally is the most accessible starting point for most developers - small enough for consumer hardware, and 14 points better at task completion than Holo3.