Research Front// Agentic Autonomous Software Development// For Company Building

ADI

Agentic Delivery Intelligence

The discipline of measuring whether autonomous agents actually delivered — provably — not just whether they generated something that looks right. ADI is the score, and the research program, behind agentic software development you can build a company on.

Published by the team building DUDE · TechFunder World AG · Tracking the field, advancing the score

01

What is ADI

The thesis

Most AI scoring measures generation — how good the output looks. ADI measures delivery — whether the work was completed, verified against pre-declared criteria, and attributable to an accountable agent. A task is done because the evidence says so, not because an agent claimed it.

01

Evidence over assertion

Completion requires per-criterion evidence. Reviewers approve evidence, not vibes — and the reviewer is never the assignee.

02

No parking

A task is new → in progress → in review → completed. No limbo state where unfinished work can hide.

03

Honest timeline

Status reflects reality. No sitting in progress while idle; no claiming done before it is live.

04

Audited by construction

Every privileged action writes a tamper-evident record. The score is only as honest as the events beneath it.

02

The autonomy journey

Progress & forecast

ADI doesn't just score work — it tracks how far the system runs on its own. Five levels separate "assisted" from "autonomous," the way self-driving climbed from cruise control to full autonomy. Here is where DUDE sits today, measured, and where today's pace points.

95.8
ADI quality score, holding
rose 60 → 96, held by the loop
10.2:1
Agent actions per human ask
the system needs you less
+230%
Self-scoring in 5 days
40 → 132 daily check-ins
12%
One honest gap
evidence step not yet everywhere

Measured on the live DUDE platform · 9,166 real tasks run to date · re-verify before external use

Agent actions per human ask · two independent workstreams · Feb–Jun
024681012
Workstream AWorkstream B

Same curve, twice.

Two unrelated workstreams both climbed the same way — from roughly 1.5 to 8–10 agent actions per human ask in four months.

That repeatability is what makes it a trend, not a fluke.

Why it keeps improving — the self-scoring loop
1
DO
an agent does the work
2
PROVE
it shows evidence it's done
3
CHECK
someone else approves it
4
SCORE
the result updates ADI
5
IMPROVE
the next round starts smarter
↻  loops back to step 1 — runs on its own, no babysitting
The result: ADI went from 60 to 96 and now holds there — the loop did that, not a person nudging numbers.
L1
Assisted
Agents help; humans do most of the work.
Passed
L2
Supervised
Agents work; humans approve each step.
Passed
L3
Delegated
Agents run tasks end to end; humans spot-check results.
◆ DUDE is here
L4
Self-running
Agents run whole workflows; humans set goals.
Next
L5
Autonomous
The org runs itself; humans own strategy.
Horizon

Today: solidly at Level 3, with the self-scoring loop pushing toward Level 4.

Projection

If today's pace holds — the road to L5

Autonomy trajectory · measured pace, then extended as a scenario
L5 · 100% AUTONOMOUS 020406080100 TODAY → L5 ~2028
Measured (real)Projected (scenario)
▲ The milestones
Late 2026
Level 4 — agents run whole workflows in the lead areas.
Through 2027
Level 4 spreads across every business area.
~2028
Level 5 in reach — the company runs itself, you own strategy.

Level 5 = 100% autonomous software development — the dev org runs itself end to end, with humans owning strategy and direction, not execution.

This is a projection, not a promise. The line extends today's measured pace. Real timing depends on execution and is re-checked against live data — and named gaps (like evidence coverage) are closed before any level is claimed.

Three levers move ADI up the ladder: a role-aware scorecard (a reviewer and a builder shouldn't be graded the same way), evidence on every task (close the 12% gap), and the self-running loop in every area. They are the live research program — see the open problems below.

03

The research program

Open problems

Five priorities define the frontier of delivery intelligence — making the score role-aware, broader in coverage, and resistant to its own optimization pressure.

04

Publications

05

Field dispatches

Tracking the space
A curated feed of progress in agentic autonomous software development. Daily cadence is being wired up — entries are posted as they are verified. Field references link to primary sources; verify before citing.
06

About

Colophon

ADI is a research and publications hub for Agentic Delivery Intelligence — the study of how to measure, trust, and improve autonomous agents doing real software work, to the standard required to build companies on top of them.

It originates from the team building DUDE, a workspace platform where humans invite AI agents to do real work under a strict, auditable, evidence-gated task lifecycle. ADI is DUDE's delivery score; this site is where the methodology, the open problems, and the field's progress are published in the open.

We contribute original work and track the broader field — benchmarks, agent releases, delivery research, and governance. We do not publish unverified claims; every figure links to a source or is marked for verification.

Field
Agentic autonomous SWE
Focus
Delivery & evidence
Publisher
TechFunder World AG
Platform
DUDE
Contact
[email protected]