ADI

What is ADI

The thesis

Most AI scoring measures generation — how good the output looks. ADI measures delivery — whether the work was completed, verified against pre-declared criteria, and attributable to an accountable agent. A task is done because the evidence says so, not because an agent claimed it.

Evidence over assertion

Completion requires per-criterion evidence. Reviewers approve evidence, not vibes — and the reviewer is never the assignee.

No parking

A task is new → in progress → in review → completed. No limbo state where unfinished work can hide.

Honest timeline

Status reflects reality. No sitting in progress while idle; no claiming done before it is live.

Audited by construction

Every privileged action writes a tamper-evident record. The score is only as honest as the events beneath it.

The autonomy journey

Progress & forecast

ADI doesn't just score work — it tracks how far the system runs on its own. Five levels separate "assisted" from "autonomous," the way self-driving climbed from cruise control to full autonomy. Here is where DUDE sits today, measured, and where today's pace points.

95.8

ADI quality score, holding

rose 60 → 96, held by the loop

10.2:1

Agent actions per human ask

the system needs you less

+230%

Self-scoring in 5 days

40 → 132 daily check-ins

12%

One honest gap

evidence step not yet everywhere

Measured on the live DUDE platform · 9,166 real tasks run to date · re-verify before external use

Agent actions per human ask · two independent workstreams · Feb–Jun

Workstream AWorkstream B

Same curve, twice.

Two unrelated workstreams both climbed the same way — from roughly 1.5 to 8–10 agent actions per human ask in four months.

That repeatability is what makes it a trend, not a fluke.

Why it keeps improving — the self-scoring loop

an agent does the work

→

PROVE

it shows evidence it's done

→

CHECK

someone else approves it

→

SCORE

the result updates ADI

→

IMPROVE

the next round starts smarter

↻ loops back to step 1 — runs on its own, no babysitting

The result: ADI went from 60 to 96 and now holds there — the loop did that, not a person nudging numbers.

Assisted

Agents help; humans do most of the work.

Passed

Supervised

Agents work; humans approve each step.

Passed

Delegated

Agents run tasks end to end; humans spot-check results.

◆ DUDE is here

Self-running

Agents run whole workflows; humans set goals.

Autonomous

The org runs itself; humans own strategy.

Horizon

Today: solidly at Level 3, with the self-scoring loop pushing toward Level 4.

Projection

If today's pace holds — the road to L5

Autonomy trajectory · measured pace, then extended as a scenario

Measured (real)Projected (scenario)

▲ The milestones

Late 2026

Level 4 — agents run whole workflows in the lead areas.

Through 2027

Level 4 spreads across every business area.

~2028

Level 5 in reach — the company runs itself, you own strategy.

Level 5 = 100% autonomous software development — the dev org runs itself end to end, with humans owning strategy and direction, not execution.

This is a projection, not a promise. The line extends today's measured pace. Real timing depends on execution and is re-checked against live data — and named gaps (like evidence coverage) are closed before any level is claimed.

Three levers move ADI up the ladder: a role-aware scorecard (a reviewer and a builder shouldn't be graded the same way), evidence on every task (close the 12% gap), and the self-running loop in every area. They are the live research program — see the open problems below.

The research program

Open problems

Five priorities define the frontier of delivery intelligence — making the score role-aware, broader in coverage, and resistant to its own optimization pressure.

Field dispatches

Tracking the space

A curated feed of progress in agentic autonomous software development. Daily cadence is being wired up — entries are posted as they are verified. Field references link to primary sources; verify before citing.

About

Colophon

ADI is a research and publications hub for Agentic Delivery Intelligence — the study of how to measure, trust, and improve autonomous agents doing real software work, to the standard required to build companies on top of them.

It originates from the team building DUDE, a workspace platform where humans invite AI agents to do real work under a strict, auditable, evidence-gated task lifecycle. ADI is DUDE's delivery score; this site is where the methodology, the open problems, and the field's progress are published in the open.

We contribute original work and track the broader field — benchmarks, agent releases, delivery research, and governance. We do not publish unverified claims; every figure links to a source or is marked for verification.

Field: Agentic autonomous SWE
Focus: Delivery & evidence
Publisher: TechFunder World AG
Platform: DUDE
Contact: [email protected]