A 21.7-point accuracy jump on the same model, same questions, same data. The only difference: how the spreadsheet was fed to GPT.
Credal, an enterprise AI platform, published benchmarks showing that preprocessing corporate spreadsheets before sending them to GPT models produces dramatic accuracy improvements. The technique addresses a problem anyone who has tried to query a large Excel file through ChatGPT knows well: the model either silently drops data, wastes its context window (the amount of text a model can process at once) parsing cell structure, or just times out.
The Problem With Real-World Spreadsheets
Corporate spreadsheets are nothing like the clean CSVs used in AI demos. They have 100+ tabs, merged cells, pivot tables, and thousands of rows where the same value repeats endlessly. When GPT tries to ingest these files raw, it burns through its context window on structural noise instead of actual data. The result: truncated answers, hallucinated numbers, or no answer at all.
Credal built a tool called ReadSpreadsheet that does two things before the data ever reaches GPT. First, it scans every sheet's name, dimensions, headers, and sample rows, then scores each sheet's relevance to the user's question. Instead of dumping all 100+ tabs into the model, it passes only the ones that matter.
Second, it compresses the data itself. Rather than listing "N/A" across 500 rows, it writes "N/A appears in rows 120-620." It preserves headers and section boundaries but strips repetitive stretches and empty cells.
The Benchmark Numbers
Credal tested across two GPT generations with the compression enabled versus disabled:
- GPT 5.4: 79.1% accuracy without preprocessing, 87.6% with it (+8.5 points)
- GPT 5.2: 60.7% without, 82.4% with (+21.7 points)
The older model benefited far more, which makes sense. GPT 5.2's weaker native file handling means it has more room to gain from external preprocessing. But even GPT 5.4, which handles files better out of the box, still improved nearly 9 points.
Beyond raw accuracy, the tool eliminated timeout failures that previously prevented answers entirely on complex files.
The takeaway here is practical: if you are piping spreadsheet data into any LLM, the formatting and compression of that data matters as much as the model you choose. A cheaper, older model with smart preprocessing outperformed a newer model working with raw files. That is a useful finding for anyone building internal tools or workflows that touch enterprise data, and a reminder that prompt engineering extends beyond the text prompt itself to how you structure the data going in.