Google's Gemma 4 open-source AI model now runs fully on an iPhone - offline, locally, with no data leaving the device.
On-device inference means the model processes your queries directly on the phone's chips rather than routing them to a remote server. According to coverage from Gizmo Week, Gemma 4 takes advantage of the iPhone's Neural Engine - Apple's dedicated AI processing hardware - to run without any cloud support whatsoever.
Gemma is Google's family of open-source, lightweight AI models built to run outside of Google's own data centers. Getting a version to operate natively on consumer mobile hardware without any connectivity is a real capability milestone, not a marketing claim. These models are compact by design - small enough to fit on a phone's storage and fast enough on dedicated chips to return responses without the round-trip delay of hitting a server.
The practical cases where this matters: field workers without reliable signal, healthcare or legal professionals who can't send client data to third-party servers, and anyone who'd rather not have their AI queries logged anywhere. Full offline operation gives users something cloud-based tools like Claude Mobile can't offer - AI that functions entirely without your data touching the internet.
Apple has been building out Apple Intelligence as its own on-device AI system, but Gemma 4 running on iPhone hardware adds a Google-built alternative that neither company controls as a gatekeeper.
For developers, the open-source nature means building Gemma 4 applications without API costs, usage rate limits, or dependency on any company's server uptime. That's a meaningful difference from every major commercial AI product on the market right now.