Skip to content

ADE-Bench Benchmark Results

altimate-code achieves 74.4% pass rate (32/43 tasks) — #1 on agentic data engineering benchmarks.

Model
Database

About ADE-Bench

ADE-Bench is a benchmark created by Benn Stancil (founder of Mode) in collaboration with dbt Labs. It evaluates AI agents on real-world analytics and data engineering tasks using actual dbt projects and databases. Each task runs in a Docker container sandbox — the agent attempts to resolve the task, and success is measured by whether all dbt tests pass afterward. Tasks include realistic data problems: vague requests like "it’s broken," debugging, schema issues, and complex analytics queries.

Test Configuration

Harness and LLMaltimate-code (Sonnet 4.6)
DatabaseSnowflake
Total Tasks43 (45 ran, 2 excluded due to dev tasks)
Max Retries on failures3
Best Run32/43 (74.4%)
Worst Run29/43 (67.4%)
Excludedf1008, workday001 (dev tasks)

Benchmark Comparison

Agents evaluated on ADE-Bench with Snowflake.

altimate-code(Sonnet 4.6 · Snowflake) — 32/4374.4%
Cortex Code CLI(Opus 4.6 · Snowflake) — 28/4365%
Source →

Key Insight: The Harness Matters More Than the Model

Across both benchmarks, altimate-code on Sonnet 4.6 beats competitors running Opus 4.6 — a more capable, more expensive model. Purpose-built tooling and deterministic operations outperform raw model capability alone.

The harness — not the model — is the differentiator.

Per-Task Results — Snowflake

Best Run — 32 passed, 11 failed out of 43 tasks

#TaskResultScorePass Rate
1airbnb00111/11100%
2airbnb00212/12100%
3airbnb0038/8100%
4airbnb0043/3100%
5airbnb0054/580%
6airbnb0068/8100%
7airbnb00711/1291%
8airbnb0085/5100%
9airbnb0092/2100%
10analytics_engineering0012/2100%
11analytics_engineering0023/3100%
12analytics_engineering002.medium3/3100%
13analytics_engineering0033/3100%
14analytics_engineering0042/366%
15analytics_engineering0054/4100%
16analytics_engineering0066/875%
17analytics_engineering00711/11100%
18analytics_engineering007.medium11/11100%
19asana0013/3100%
20asana0024/4100%
21asana00317/1894%
22asana0046/785%
23asana0058/988%
24asana005.hard8/988%
25f10017/7100%
26f100210/1190%
27f10035/5100%
28f1003.hard5/5100%
29f10043/3100%
30f10055/5100%
31f1005.medium5/5100%
32f10065/5100%
33f1006.hard5/5100%
34f10077/7100%
35f1007.hard7/7100%
36f1007.medium7/7100%
37f10092/2100%
38f10103/3100%
39f1010.medium3/3100%
40f10116/785%
41intercom0012/366%
42intercom0025/5100%
43intercom0033/3100%

Sources