Open Source Notable

NuExtract3: A 4B Open-Weight Model That Turns Documents Into Structured Data

May 25, 2026 2 min read

Most document extraction pipelines are three jobs stitched together: OCR to read the text from an image, a parser to organize the layout, then an LLM call to pull out specific fields. NuExtract3 tries to collapse all three into a single model.

NuExtract3 is a 4 billion parameter vision-language model (VLM) - a model that processes both images and text at the same time - built specifically for pulling structured data out of documents. It handles OCR (reading text embedded in images rather than machine-readable text files), Markdown conversion, and structured extraction in one pass. Feed it a scanned invoice or a photographed form, get back clean Markdown or JSON with the relevant fields separated out.

The Self-Hosting Argument

The model is released as open weights, meaning you can download the model parameters and run everything locally. At 4 billion parameters, it fits on a single mid-range GPU - an Nvidia RTX 3080 or similar consumer card - without a dedicated server cluster.

For document extraction specifically, that matters. Invoice processing, contract review, patient intake forms, customer records - these are exactly the workflows where companies are reluctant to route data through third-party APIs. Running NuExtract3 on your own infrastructure keeps sensitive documents off external servers.

Commercial alternatives like AWS Textract or Azure Document Intelligence are managed services with per-page pricing. They work well, but they mean your data leaves your environment. For teams with compliance constraints or just a preference for controlling their own stack, a 4B model you can run on a $500 GPU changes the economics.

What 4B Parameters Gets You

NuExtract3 is not a general-purpose reasoning model. It won't write your emails or summarize a quarterly report. It's a narrow specialist: structured extraction from document images.

For that specific job, a purpose-built 4B model running locally can match or beat a general-purpose large model that bills by the API call and processes your client data off-site. The tradeoff is setup time and hardware cost upfront versus ongoing API spend and data sovereignty concerns.

The vision-language capability is the practical differentiator versus text-only extraction tools. Scanned archives, photographed receipts, and handwritten forms are all fair game - not just clean PDFs with readable text layers.

The Self-Hosting Argument

What 4B Parameters Gets You

Related Tools

More from today

Uber's COO Says the ROI on AI Token Spending No Longer Adds Up

Wix Cuts Up to 1,000 Jobs While Revenue Grows 14%

Uber's COO Says AI 'Tokenmaxxing' Costs Are Getting Hard to Defend

Cookie Preferences