A patent lawyer with less than three months of coding experience just classified 3.5 million US patents using a 9-billion parameter language model running on a single consumer GPU. The whole thing took about 48 hours.
The project is a strong proof of concept for what local AI can do with real data at real scale. The setup: download every US patent from 2016 to 2025 via the USPTO's PatentsView database, load them into a 74GB SQLite file with full-text search indexing, then run Nvidia's Nemotron 9B locally on an RTX 5090 to sort each patent into one of 100 technology tags. No cloud API calls. No monthly bill. Just one graphics card doing inference (running the model to generate outputs) for two days straight.
The search engine built on top uses BM25 ranking, a standard text-matching algorithm, with custom weights that prioritize title matches (10x) and assignee/company matches (5x) over abstract (3x) and patent claims (1x). There's also natural language query expansion, so searching in plain English returns relevant results without needing to know patent classification codes.
What This Actually Demonstrates
The interesting part is not that someone built a patent search tool. Patent search tools exist. The interesting part is the economics. An RTX 5090 costs around $2,000. Processing 3.5 million documents through a cloud API like GPT-4o or Claude would cost thousands of dollars in token fees, and you'd be rate-limited the entire time. Here, the marginal cost after buying the GPU is electricity.
This is the pattern that keeps showing up in local AI: once you own the hardware, batch-processing millions of documents becomes a fixed cost problem instead of a per-unit cost problem. For law firms, research institutions, or anyone sitting on large document collections, that math changes everything about what's feasible.
The 9B parameter model is small enough to run on consumer hardware but apparently accurate enough for broad technology classification. Patent classification doesn't need the nuance of a 70B or 400B model. It needs to read an abstract and pick from 100 categories, which is exactly the kind of structured task where smaller models hold up well.
The creator went from zero coding experience in December 2025 to processing millions of records three months later. That timeline says as much about the current state of AI tooling as it does about the project itself.