Related ToolsClaudeClaude Code

Four Hard-Won Rules for Adding AI Features That Actually Work

AI news: Four Hard-Won Rules for Adding AI Features That Actually Work

Most AI feature launches follow a predictable arc: ship a prototype where the LLM does everything, watch it break in production, then spend months figuring out which parts actually needed AI in the first place.

A recent write-up from developer Alex Ghiculescu, who built an LLM-based data validation system, lays out a pattern that more teams should steal. The core lesson: use the model for as little as possible.

Let the LLM Parse, Let Code Decide

Ghiculescu's first version handed everything to the language model - parsing user input, applying business rules, returning results. It was slow, inconsistent, and hard to debug. The fix was reducing the LLM's job to one thing: converting messy user input into structured rules. After that, plain old deterministic code handles the actual validation.

This is the pattern that keeps showing up in production AI systems. The model is good at understanding fuzzy human language. It's bad at reliably applying logic the same way twice. Split those jobs accordingly.

Test Prompts Like You Test Code

The approach to testing is pragmatic. Ghiculescu writes unit tests that call the Anthropic API directly, but keeps them out of CI pipelines (continuous integration - the automated checks that run when you push code). They require an API key, they cost money per run, and LLM outputs are inherently variable. Instead, these tests run manually when iterating on prompts, giving developers confidence without racking up bills or creating flaky builds.

This sits in a middle ground that most teams haven't found yet. Either they don't test their prompts at all, or they build elaborate evaluation frameworks that nobody maintains.

Users Want to See the Work

The most underappreciated lesson: users don't want magic. They want to understand how results were derived. Ghiculescu references the concept of "working with a black box" - when an AI feature produces an output, users need enough transparency to trust it and enough control to correct it.

This tracks with what we see across AI tools broadly. The products gaining traction aren't the ones with the most impressive demos. They're the ones that show their reasoning, let users edit intermediate steps, and make it obvious when the AI isn't confident.

The write-up also borrows from a pattern popularized by Claude Code's CLAUDE.md files, where developers document mistakes so the AI won't repeat them. Ghiculescu adapts this to business rule validation - essentially building a growing knowledge base of edge cases that the system learns from over time.

None of these ideas are new individually. But seeing them applied together in a shipping product reinforces a simple truth: the hard part of AI features isn't the model. It's everything around it.