Models

A macOS Developer Tested Claude on Real Projects. Here's What Held Up.

March 8, 2026 2 min read

Image: Anthropic

A macOS developer spent months using Claude for real shipping products and came away with a nuanced verdict: it's genuinely useful for experienced developers, but every line still needs human review.

The most telling test involved building a window management feature called "Stages" for a macOS app. The developer fed Claude a 200-line markdown specification, and the model produced code that compiled without errors on the first attempt. It refactored UI components, implemented window hiding workarounds, and reverse-engineered another developer's window manager API from GitHub source code.

That sounds impressive, and it is. But context matters.

Where Claude Delivers

Claude's sweet spot is adaptation work - taking existing, working code and reshaping it for a new context. When given reference code and clear specifications, it produced readable, functional output fast. In one case, Claude learned an unfamiliar programming language (Cherri, used for Apple Shortcuts) from documentation alone and generated 98 functional Shortcuts in under 10 minutes.

For boilerplate-heavy tasks like setting up menubar app architecture, auto-update frameworks, and license verification flows, Claude handled the grunt work reliably. These are tasks that experienced developers can do but find tedious - exactly the kind of work where AI assistance pays off.

Where It Falls Short

The cracks appeared on tasks requiring obscure private APIs or undocumented system behaviors. When there was no existing code to reference, Claude needed detailed explanations of non-standard functionality before it could produce anything useful. This makes sense: the model works from patterns in its training data, and private macOS APIs aren't well-represented there.

More concerning was a pattern where generated code looked functional in the UI but lacked internal logic. Features appeared in the interface but did nothing when triggered - the kind of bug that passes a visual check but fails in practice.

The developer also noted a skill gap problem: non-developers trying to use Claude for coding struggled because they lacked the technical vocabulary to guide it effectively. You need to know what to ask for, which means you need to already understand the problem domain.

The Honest Assessment

The conclusion mirrors what many professional developers are finding: Claude is a productivity multiplier, not a replacement. It handles the 60% of coding that's pattern-matching and adaptation. The remaining 40% - architectural decisions, debugging edge cases, working with undocumented systems - still requires a human who understands the codebase.

The developer described moving from skepticism to "cautious acceptance" while acknowledging they might be "in denial" about where this is heading long-term. That tension between present limitations and future trajectory is the honest state of AI-assisted development in early 2026.

Where Claude Delivers

Where It Falls Short

The Honest Assessment

Related Tools

More from today

Luma AI Launches Uni-1, a Single Model That Both Understands and Generates Images

Qwen3.5 4B Reportedly Matches GPT-4o on Benchmarks With 4 Billion Parameters

ChatGPT's Responses Are Starting to Sound Like Infomercials

Cookie Preferences