Science Benchmark

About ScienceBenchmark

We introduce ScienceBenchmark, a new complex NL-to-SQL benchmark for three real-world, highly domain-specific databases: Research Policy Making, Astrophysics, and Cancer research. For this new benchmark, SQL experts and domain experts created high-quality NL/SQL-pairs for each domain. To our knowledge, ScienceBenchmark is the first NL-to-SQL benchmark designed with complex real-world scientific databases, containing challenging training and test data carefully validated by domain experts.

Paper Code

ScienceBenchmark Dataset

ScienceBenchmark Statistics

	Tbls	Cols	Rows	Avg R/T	Size(GB)	#NL/SQL (man+syn)
CORDIS	19	82	671K	35K	1	200 + 1306
SDSS	6	61	86M	14M	6.1	200 + 2061
OncoMX	25	106	65M	2.6M	12	200 + 1065

Submission

Please follow the Submission Guideline (below) and contact sciencebenchmark@init.lists.zhaw.ch for test evaluation.

Submission Guideline

In order to preserve the integrity of ScienceBenchmark, the test set is not publicly available. To submit your model for evaluation on the dev set and hidden test set please submit your model according to the following tutorial: Coming Soon!

Citation

@article{10.14778/3636218.3636225,
author = {Zhang, Yi and Deriu, Jan and Katsogiannis-Meimarakis, George and Kosten, Catherine and Koutrika, Georgia and Stockinger, Kurt},
title = {ScienceBenchmark: A Complex Real-World Benchmark for Evaluating Natural Language to SQL Systems},
year = {2024},
issue_date = {December 2023},
publisher = {VLDB Endowment},
volume = {17},
number = {4},
issn = {2150-8097},
url = {https://doi.org/10.14778/3636218.3636225},
doi = {10.14778/3636218.3636225},
journal = {Proc. VLDB Endow.},
month = {mar},
pages = {685–698},
numpages = {14}
}

Overall Leaderboard - Execution Accuracy

Rank	Model	Code	Dev (%)	Test (%)
1 Aug 20, 2023	T5-Large	link	33%	23%
2 Aug 20, 2023	SmBoP+GraPPa	link	24%	10%
3 Aug 20, 2023	ValueNet	link	35%	9%

CORDIS Leaderboard - Execution Accuracy

Rank	Model	Code	Dev (%)	Test (%)
1 Aug 20, 2023	T5-Large	link	29%	14%
2 Aug 20, 2023	SmBoP+GraPPa	link	21%	7%
3 Aug 20, 2023	ValueNet	link	35%	6%

SDSS Leaderboard - Execution Accuracy

Rank	Model	Code	Dev (%)	Test (%)
1 Aug 20, 2023	T5-Large	link	15%	30%
2 Aug 20, 2023	ValueNet	link	21%	14%
3 Aug 20, 2023	SmBoP+GraPPa	link	15%	10%

OncoMx Leaderboard - Execution Accuracy

Rank	Model	Code	Dev (%)	Test (%)
1 Aug 20, 2023	T5-Large	link	56%	24%
2 Aug 20, 2023	SmBoP+GraPPa	link	35%	13%
3 Aug 20, 2023	ValueNet	link	49%	8%