How well does Claude Code actually hold up when you throw real data work at it? Data analyst Richard Demsyn-Jones put it through three projects of increasing messiness and graded each one like a professor.
The results tell a useful story about where AI coding tools earn their keep and where they still need a human riding shotgun.
The Grades: A, B+, and B-Minus
The strongest showing was a sudoku rating system (Grade: A), where Demsyn-Jones needed an algorithm to rate puzzle difficulty on sudoku.com. Claude Code handled algorithm construction, evaluation frameworks, and data visualization while the author played reviewer. Having a clear vision, existing code to build on, and deep domain knowledge made this a near-perfect use case.
A workplace data analysis tool (Grade: B+) came next. Here, Claude Code was fed business context and database schemas, then asked to generate SQL queries, revenue reports, conversion funnel analysis, and charts. The SQL interpretation and visualization work was strong, but managing context across multiple instruction files got unwieldy as the project grew.
The weakest performance was policy research on Canada's daycare system (Grade: B-minus). When asked to do web research, consolidate sources, and generate a critical analysis of the CWELCC program, Claude Code produced what Demsyn-Jones described as "superficial analysis" that parroted government talking points. It took significant prompt reworking to push the tool toward genuine critical thinking rather than sycophantic summarization.
10 Workflow Principles That Actually Helped
Demsyn-Jones distilled his experience into practical rules, and several stand out for anyone doing data work with AI tools:
- Build comprehensive unit tests early. Give Claude Code a way to evaluate its own output, and quality jumps.
- Cache expensive steps. Don't let the tool re-run costly computations every iteration.
- Use tiered data analysis - raw data, then claims, then summaries - so the tool doesn't skip straight to conclusions.
- Manage context deliberately. CLAUDE.md files, rules documents, and reference docs each serve different purposes. Dumping everything into one place degrades performance.
- Grant permissions upfront. Constant approval prompts break flow and slow iteration.
The Real Takeaway
The pattern across all three projects is consistent: Claude Code performs best when the human brings domain expertise and clear quality standards. It's a force multiplier for judgment, not a replacement for it.
As Demsyn-Jones put it, quoting researcher Ethan Mollick: "The people who thrive will be the ones who know what good looks like - and can explain it clearly enough that even an AI can deliver it."
That tracks with what we see across AI coding tools generally. The gap between a B-minus research output and an A-grade algorithm project wasn't the tool - it was how well the human could define "done" before starting. Anyone considering Claude Code for data analysis should start with a project where they already know what good output looks like. That's where the productivity gains are real and immediate.