When an AI agent tries to click a button on your website, it's guessing. It reads the HTML, processes ARIA labels if they exist, and infers from visual context what each element does. Sometimes that inference is right. Often it isn't - and the agent either fails silently or triggers the wrong action entirely.
The proposed fix: a structured file that developers publish alongside their UI, explicitly describing what each component is, what it does, and how it behaves. Think of it as robots.txt for interactive interfaces - except instead of telling crawlers what to avoid, you're telling agents what everything actually means.
The Problem With Guessing
Current AI coding and automation agents - including browser automation tools, computer use features in Claude, and agent frameworks like Cursor - piece together UI intent from whatever's available: HTML structure, element labels, screenshots, and model reasoning. This works well enough for standard HTML forms. It breaks down on custom components, SPAs (single-page apps that build the interface dynamically using JavaScript instead of loading it from a server), and anything with non-obvious interaction patterns.
A dropdown styled as a custom div, a modal that requires a specific click sequence, a button that's present in the page code but visually hidden - these trip up agents constantly because the underlying HTML gives no reliable signal about what the element actually does.
Developers who've built agent-powered workflows often end up writing elaborate workarounds in their prompts just to handle specific UI quirks: "When you see the green button in the top right, that's the save button, not the publish button." That's a patch, not a fix.
Why a File Format Might Actually Work
The llms.txt movement started in late 2024 with a simple idea: if you want AI to understand your site's content, give it a clean, structured version alongside the regular site. A meaningful number of developer tools and documentation sites have adopted it since.
A UI description file extends this to interactions. A developer declares once: "button ID checkout-confirm triggers final purchase, requires payment-form-complete state, cannot be undone." Any agent reading that file before interacting with the interface has context that would otherwise take several failed attempts to infer.
The challenge is standardization. For this to be useful, agent frameworks need to agree on a schema and actually consume the file. Right now, the concept is still community-level - there's no formal spec and no major agent platform has committed to reading one.
But for teams already building internal tools with AI agents, maintaining your own UI description document just for your own agents is worth the few hours it takes. You'll spend less time debugging agent failures and less time writing prompt workarounds.