Skip to content

DAB Benchmark Results

altimate-code is #1 on DAB — 60.4% Pass@1, ahead of every competing agent, including ones running Claude Opus 4.6.

About DAB

DAB (Data Agent Benchmark) is the first benchmark for evaluating data agents on realistic, complex, data-oriented tasks — a collaboration between the EPIC Data Lab at UC Berkeley and Hasura PromptQL. Unlike prior SQL-only or single-database benchmarks, DAB stresses agents under real enterprise data complexity: multi-database integration, ill-formatted key joins, unstructured-text transformation, and domain knowledge. It spans 54 queries across 12 datasets, 9 domains, and 4 database systems — PostgreSQL, MongoDB, SQLite, and DuckDB.

Test Configuration

Agentaltimate-code
Backbone LLMClaude Sonnet 4.6 (via OpenRouter)
Result#1 — Pass@1 0.604
Trials5 per query (270 trials across 54 queries)
Datasets12 datasets · 9 domains
DatabasesPostgreSQL, MongoDB, SQLite, DuckDB
Dataset hintsYes (db_description_withhint.txt)
SubmissionPR #44

Leaderboard

Official DAB leaderboard — Pass@1, ranked. altimate-code holds the #1 spot.

#1Altimate Code(Claude Sonnet 4.6)60.4%
Submission →
#2Pi Coding Agent(Claude Opus 4.6)56.0%
Submission →
#3PromptQL(Gemini 3.1 Pro)54.3%
Submission →
#4PromptQL(Claude Opus 4.6)50.8%
Submission →
#5Oracle Forge (Tenacious Intelligence)(Claude Sonnet 4.6)45.5%
Submission →
#6Claude Opus 4.6 (ReAct)(Claude Opus 4.6)43.8%
Submission →
#7Gemini-3-Pro (ReAct)(Gemini 3 Pro)38.0%
#8GPT-5-mini (ReAct)(GPT-5-mini)30.0%
#9GPT-5.2 (ReAct)(GPT-5.2)25.0%
#10Kimi-K2 (ReAct)(Kimi-K2)23.0%
#11Oracle Forge (Team Cohere)(Gemini 2.0 Flash)12.8%
#12Gemini-2.5-Flash (ReAct)(Gemini 2.5 Flash)9.0%

Benchmark Composition

12 datasets spanning 4 database systems — every task requires the agent to work across heterogeneous stores.

DatasetDatabasesDBsTablesQueries
agnewsMongoDB, SQLite234
bookreviewPostgreSQL, SQLite223
crmarenaproDuckDB, PostgreSQL, SQLite62713
deps_dev_v1DuckDB, SQLite232
github_reposDuckDB, SQLite264
googlelocalPostgreSQL, SQLite224
music_brainz_20kDuckDB, SQLite223
pancancer_atlasDuckDB, PostgreSQL233
patentsPostgreSQL, SQLite223
stockindexDuckDB, SQLite223
stockmarketDuckDB, SQLite227545
yelpDuckDB, MongoDB257
TotalPostgreSQL · MongoDB · SQLite · DuckDB282,81154

Sources