Research Notable

Claude Code Hits 92% on Bioinformatics Tasks Without Model Retraining

April 9, 2026 2 min read

Image: Anthropic

What happens when you treat an AI model like a skilled intern with a reference manual instead of retraining it from scratch?

A GitHub project called SciAgent-Skills has published benchmark results showing Claude Code reaching 92% accuracy on bioinformatics tasks - without fine-tuning (retraining the model on specialized scientific data) and without RAG (retrieval-augmented generation, where you feed the model relevant documents at query time to fill knowledge gaps). The approach is simpler: a library of structured "skills" - detailed, domain-specific instruction sets that tell Claude Code exactly how to handle bioinformatics workflows. Think of it as giving the model a well-organized lab manual instead of sending it to graduate school.

What the Benchmark Measures

The bioinformatics tasks in SciAgent-Skills include gene sequence analysis, protein structure prediction pipelines, and data format conversions common in genomic research. These are exactly the kinds of tasks where domain knowledge matters most - wrong tool parameters or incorrect file format assumptions can silently corrupt results without any obvious error message.

The project argues the accuracy improvement over baseline Claude Code is large enough to challenge the assumption that specialized scientific AI requires model retraining. While the exact baseline number isn't prominently published in the repository, the gap is substantial enough that the authors built a full skill library around the finding.

Prompting Instead of Training

The reason fine-tuning is typically used in specialized domains is that general models don't know the exact syntax, tools, and naming conventions of a narrow field. SciAgent-Skills argues you can close most of that gap with structured prompting instead.

The technique isn't specific to biology. Finance analysts running the same types of calculations repeatedly, legal teams processing similar document types, or marketing teams running standardized competitive analysis could replicate the same approach. Write detailed enough domain-specific instructions, and general models perform closer to specialist ones - at a fraction of the cost of training or maintaining a specialized model.

Where This Doesn't Help

This approach works best for well-defined, repeatable tasks. Novel research problems - where the right method itself is unknown - still benefit from models with deeper domain training. And maintaining a skills library requires ongoing work: as tools change and new methods emerge, the instruction sets need updating.

For teams doing routine specialized work with AI coding tools, though, this is a meaningful data point. Before investing in specialized model training, it's worth testing how far a well-designed set of domain instructions can take a general model. SciAgent-Skills suggests the answer, at least in bioinformatics, is quite far.

What the Benchmark Measures

Prompting Instead of Training

Where This Doesn't Help

Related Tools

More from today

AI Writes Code Faster Than Developers Can Check It. That's Now the Real Problem.

China's AI Micro-Drama Boom Shows What AI-Powered Content Creation Actually Looks Like

Gen Z Is Cooling on AI Tools, Gallup Polling Shows

Cookie Preferences