A San Diego startup called TokenSurf launched a proxy service that sits between your app and LLM providers, automatically routing simple queries to cheaper models while keeping complex ones on your original expensive model.
The pitch: change your OpenAI SDK's base URL to api.tokensurf.io/v1 and the service classifies each request. "What is 2+2?" gets routed to GPT-4o Mini instead of GPT-4o. "Write me a React app" stays on whatever model you specified. You bring your own API keys.
The savings claims vary widely depending on which model you're downgrading from. Routing GPT-4 legacy calls to GPT-4o Mini saves 99% per million tokens (from $60 to $0.60 on output). Routing Claude Sonnet to Claude Haiku 3.5 saves about 73%. The realistic range for most current-model users is probably 50-76%, since few people are still paying GPT-4 legacy prices.
TokenSurf supports OpenAI, Anthropic, Google, and OpenRouter across 70+ models. Pricing starts free at 1,000 requests per month, then $0.001 per request on pay-as-you-go, dropping to $0.0006 per request at the $3,000/month scale tier.
The concept isn't new. Smart routing and model cascading have been features in platforms like OpenRouter and various open-source tools for a while. The differentiator TokenSurf is pushing is simplicity: no SDK changes, no code refactoring, just a URL swap. Whether the classification accuracy holds up under production workloads - where the line between "simple" and "complex" queries gets blurry - is the real question. A misrouted complex query that gets a bad answer from a cheap model costs more than the money you saved on it.