DAB Benchmark Results
altimate-code is #1 on DAB — 60.4% Pass@1, ahead of every competing agent, including ones running Claude Opus 4.6.
About DAB
DAB (Data Agent Benchmark) is the first benchmark for evaluating data agents on realistic, complex, data-oriented tasks — a collaboration between the EPIC Data Lab at UC Berkeley and Hasura PromptQL. Unlike prior SQL-only or single-database benchmarks, DAB stresses agents under real enterprise data complexity: multi-database integration, ill-formatted key joins, unstructured-text transformation, and domain knowledge. It spans 54 queries across 12 datasets, 9 domains, and 4 database systems — PostgreSQL, MongoDB, SQLite, and DuckDB.
Test Configuration
| Agent | altimate-code |
| Backbone LLM | Claude Sonnet 4.6 (via OpenRouter) |
| Result | #1 — Pass@1 0.604 |
| Trials | 5 per query (270 trials across 54 queries) |
| Datasets | 12 datasets · 9 domains |
| Databases | PostgreSQL, MongoDB, SQLite, DuckDB |
| Dataset hints | Yes (db_description_withhint.txt) |
| Submission | PR #44 |
Leaderboard
Official DAB leaderboard — Pass@1, ranked. altimate-code holds the #1 spot.
Benchmark Composition
12 datasets spanning 4 database systems — every task requires the agent to work across heterogeneous stores.
| Dataset | Databases | DBs | Tables | Queries |
|---|---|---|---|---|
| agnews | MongoDB, SQLite | 2 | 3 | 4 |
| bookreview | PostgreSQL, SQLite | 2 | 2 | 3 |
| crmarenapro | DuckDB, PostgreSQL, SQLite | 6 | 27 | 13 |
| deps_dev_v1 | DuckDB, SQLite | 2 | 3 | 2 |
| github_repos | DuckDB, SQLite | 2 | 6 | 4 |
| googlelocal | PostgreSQL, SQLite | 2 | 2 | 4 |
| music_brainz_20k | DuckDB, SQLite | 2 | 2 | 3 |
| pancancer_atlas | DuckDB, PostgreSQL | 2 | 3 | 3 |
| patents | PostgreSQL, SQLite | 2 | 2 | 3 |
| stockindex | DuckDB, SQLite | 2 | 2 | 3 |
| stockmarket | DuckDB, SQLite | 2 | 2754 | 5 |
| yelp | DuckDB, MongoDB | 2 | 5 | 7 |
| Total | PostgreSQL · MongoDB · SQLite · DuckDB | 28 | 2,811 | 54 |
Sources
- DAB Leaderboard (UC Berkeley EPIC Data Lab) — Official leaderboard and benchmark overview
- DataAgentBench on GitHub — Benchmark repository, datasets, and submission guide
- altimate-code submission (PR #44) — Our leaderboard submission
- DAB paper (arXiv) — Methodology and analysis