Most discussions of production AI costs split between two camps: founders saying it's affordable and engineers saying it blew their budget. One developer spent five months running Claude Haiku 4.5 on AWS Bedrock and posted the actual numbers.
The setup: a Next.js application on AWS Amplify, connected to Claude Haiku 4.5 through AWS Bedrock - Amazon's managed service for accessing AI foundation models without building your own inference infrastructure. Two production workloads: a customer-facing chat widget built on RAG (retrieval-augmented generation, where the model pulls relevant chunks from a document library before answering rather than drawing purely on its training data), backed by about 16 documents, plus a second internal application.
Why Haiku Changes the Cost Picture
The model choice is the most consequential decision in this stack. Claude Haiku is Anthropic's fastest and cheapest tier - designed for high-volume tasks where speed and cost efficiency matter more than maximum reasoning quality. For a customer-facing chat widget answering questions from a bounded set of 16 documents, Haiku is the right call. Sonnet or Opus would add significant per-query cost without meaningfully improving answers drawn from a small, curated knowledge base.
AWS Bedrock charges by token - a token being roughly three-quarters of a word, so a 1,000-word document runs about 1,300 tokens. Input and output tokens are priced separately. In a RAG setup, input costs dominate: every query sends the user's question plus retrieved document chunks to the model before any response is generated. Sixteen documents is a tight scope, which caps the per-query token count and keeps billing predictable.
What Five Months of Data Actually Shows
Single-month billing snapshots are almost useless for planning. Production traffic is rarely flat - month one tends to be low while users discover the product, then volume climbs as engagement grows. Five months captures that ramp-up and shows steady-state costs more honestly than any single billing cycle.
Query volume, more than any other variable, determines what production AI costs. A customer-facing chat widget at 100 queries a day runs at a fraction of the cost of one at 10,000. The most common planning failure is modeling API costs on documentation examples rather than actual user behavior, then getting surprised when traffic grows.
Real billing data from someone who made deliberate architecture choices - Haiku over Sonnet, a tight knowledge base, managed infrastructure on Bedrock - is the kind of reference that actually helps when building something similar.