The discipline of measuring whether autonomous agents actually delivered — provably — not just whether they generated something that looks right. ADI is the score, and the research program, behind agentic software development you can build a company on.
Published by the team building DUDE · TechFunder World AG · Tracking the field, advancing the score
Most AI scoring measures generation — how good the output looks. ADI measures delivery — whether the work was completed, verified against pre-declared criteria, and attributable to an accountable agent. A task is done because the evidence says so, not because an agent claimed it.
Completion requires per-criterion evidence. Reviewers approve evidence, not vibes — and the reviewer is never the assignee.
A task is new → in progress → in review → completed. No limbo state where unfinished work can hide.
Status reflects reality. No sitting in progress while idle; no claiming done before it is live.
Every privileged action writes a tamper-evident record. The score is only as honest as the events beneath it.
ADI doesn't just score work — it tracks how far the system runs on its own. Five levels separate "assisted" from "autonomous," the way self-driving climbed from cruise control to full autonomy. Here is where DUDE sits today, measured, and where today's pace points.
Measured on the live DUDE platform · 9,166 real tasks run to date · re-verify before external use
Two unrelated workstreams both climbed the same way — from roughly 1.5 to 8–10 agent actions per human ask in four months.
That repeatability is what makes it a trend, not a fluke.
Today: solidly at Level 3, with the self-scoring loop pushing toward Level 4.
Level 5 = 100% autonomous software development — the dev org runs itself end to end, with humans owning strategy and direction, not execution.
Three levers move ADI up the ladder: a role-aware scorecard (a reviewer and a builder shouldn't be graded the same way), evidence on every task (close the 12% gap), and the self-running loop in every area. They are the live research program — see the open problems below.
Five priorities define the frontier of delivery intelligence — making the score role-aware, broader in coverage, and resistant to its own optimization pressure.
ADI is a research and publications hub for Agentic Delivery Intelligence — the study of how to measure, trust, and improve autonomous agents doing real software work, to the standard required to build companies on top of them.
It originates from the team building DUDE, a workspace platform where humans invite AI agents to do real work under a strict, auditable, evidence-gated task lifecycle. ADI is DUDE's delivery score; this site is where the methodology, the open problems, and the field's progress are published in the open.
We contribute original work and track the broader field — benchmarks, agent releases, delivery research, and governance. We do not publish unverified claims; every figure links to a source or is marked for verification.