AI News
AI news that matters. Updated daily.
No stories match your filters.
AMD Updates GAIA App to Support Custom AI Agents Built Through Chat
AMD updated its GAIA application - short for Generative AI with AMD - to support building custom AI agents through a chat interface, and repositioned the software as a "true desktop app" rather than a technical preview.
Berkeley Researchers Show AI Agent Benchmarks Can Be Systematically Gamed
What does an AI agent's top score on a major benchmark actually prove? According to researchers at UC Berkeley's RDI center, it might prove less than the industry assumes.
Claude's Quality Problem: Why Paying Users Are Losing Confidence
Six months ago, Claude was the AI model known for careful, deliberate outputs - the one that didn't break your code or rewrite things you didn't ask it to touch. That reputation is under pressure now, and the criticism is coming from the people paying most to use it.
Claude API Users Are Getting 50% of Advertised Capacity, Proxy Data Shows
11,505 API calls. Seven days. One header value that never changed: fallback-percentage: 0.5.
Alibaba Pivots Away from Open-Source AI to Focus on Revenue
Last year, Alibaba was releasing open-source AI models at a pace that kept the developer community paying close attention. Now, according to the Financial Times, the company is reconsidering that strategy in favor of generating actual revenue from its AI investments.
Sam Altman Responds to New Yorker Profile Questioning His Trustworthiness
Sam Altman published a blog post Friday responding to a New Yorker profile that called his character into question, alongside what he described as an attack on his home. He addressed both in the same post.
One Saturday, One Developer, and the Research McKinsey Charges $300K For
One Saturday. A competitive market analysis that major consulting firms charge hundreds of thousands of dollars to deliver. One developer, a web crawling API, and Claude.
AMD Director Documents Claude Code's Decline: 7,000 Sessions of Data
Nearly 7,000 coding sessions. That's the sample size behind what may be the most detailed public documentation of an AI coding tool degrading in real-time.
Claude's thinking_mode and reasoning_effort API Tags Confirmed Real
Claude's API accepts two underdocumented control tags - <thinkingmode> and <reasoningeffort> - that let developers adjust how much the model reasons before generating a response. A developer confirmed both tags work on enterprise Claude subscriptions, with API responses explicitly surfacing these parameters.
Gemma 4 31B vs Qwen 3.5 27B: Which Handles Long Documents Better?
Two open-weight models are drawing serious attention from users who run AI locally: Google's Gemma 4 31B and Alibaba's Qwen 3.5 27B. The comparison centers on long context performance - a practical concern for anyone processing lengthy documents, large codebases, or extensive research without sending data to a third-party API.
Six Months Using AI Daily: What Actually Works, What Doesn't, and What Quietly Erodes
What happens when a practitioner uses AI tools for every single task - every email, every research project, every first draft - for six months straight? The results are messier than the marketing suggests.
Anthropic Enforces Age Policy, Locking Out Under-18 Users
Anthropic is now enforcing its minimum age requirement more actively, locking accounts of users under 18 after identifying age violations through conversation history.
Anthropic Releases Managed Agents API in Public Beta at $0.08 Per Session-Hour
Anthropic just shipped Claude Managed Agents into public beta - a set of composable APIs designed to handle the infrastructure overhead that makes building production AI agents painful.