Related ToolsCursorClaude CodeAiderCody

CIRK Proposes Replacing Story Points with AI-Aware Task Scoring

AI news: CIRK Proposes Replacing Story Points with AI-Aware Task Scoring

What happens to story points when an AI agent writes most of the code?

That is the question behind CIRK, a new open standard from Portuguese startup Core618 that proposes replacing traditional effort estimation with something built for AI-assisted development. Instead of asking "how many story points is this task?" CIRK asks "how much human oversight does this task need?"

The acronym breaks down into four dimensions: Context (how much system knowledge is required), Iteration (how many revision cycles to expect), Review (what level of human validation is needed), and Integration Risk (how sensitive is the deployment). Each dimension gets a score of 1-3, producing a vector like [C2, I3, R2, K2] with a composite score from 4 to 12.

A score of 4-5 means the AI agent can run autonomously with auto-approval. A score of 10-11 means a human needs to approve every step. Score a 12 and the task needs to be broken into smaller pieces before anyone - human or AI - should touch it.

The clever part is that individual dimension scores can trigger mandatory policies regardless of the total. A task with K3 (high deployment risk) requires a staged rollout plan even if the composite score looks moderate. An I3 (high iteration uncertainty) forces the agent into draft-first mode where it proposes a plan before writing code. This makes the scoring system usable as policy-as-code that automated tools can enforce.

Practical example: a read-only status endpoint scores [C1, I1, R1, K1] = 4 and can be fully automated. A billing webhook change scores [C2, I2, R3, K3] = 10 and needs step-by-step human supervision.

The standard is MIT licensed and lives on GitHub, though it is extremely early-stage with zero stars and a single contributor as of publication. The concept is sound - as AI coding agents like Cursor, Claude Code, and Aider become the default way code gets written, teams need a framework for deciding what gets rubber-stamped and what gets a careful human review. Story points were never designed for that question.

Whether CIRK specifically gains traction matters less than the problem it identifies. Every engineering team using AI agents is going to need some version of this classification, whether they call it CIRK or build their own.