ScienceBenchmark

A complex Scientific NL-to-SQL Benchmark for Evaluating Natural Language to SQL Systems

About ScienceBenchmark

We introduce ScienceBenchmark, a new complex NL-to-SQL benchmark for three real-world, highly domain-specific databases: Research Policy Making, Astrophysics, and Cancer research. For this new benchmark, SQL experts and domain experts created high-quality NL/SQL-pairs for each domain. To our knowledge, ScienceBenchmark is the first NL-to-SQL benchmark designed with complex real-world scientific databases, containing challenging training and test data carefully validated by domain experts.

ScienceBenchmark Statistics

Tbls Cols Rows Avg R/T Size(GB) #NL/SQL (man+syn)
CORDIS 19 82 671K 35K 1 200 + 1306
SDSS 6 61 86M 14M 6.1 200 + 2061
OncoMX 25 106 65M 2.6M 12 200 + 1065

Submission

Please follow the Submission Guideline (below) and contact sciencebenchmark@init.lists.zhaw.ch for test evaluation.

Submission Guideline

In order to preserve the integrity of ScienceBenchmark, the test set is not publicly available. To submit your model for evaluation on the dev set and hidden test set please submit your model according to the following tutorial: Coming Soon!

Citation

@article{10.14778/3636218.3636225,
author = {Zhang, Yi and Deriu, Jan and Katsogiannis-Meimarakis, George and Kosten, Catherine and Koutrika, Georgia and Stockinger, Kurt},
title = {ScienceBenchmark: A Complex Real-World Benchmark for Evaluating Natural Language to SQL Systems},
year = {2024},
issue_date = {December 2023},
publisher = {VLDB Endowment},
volume = {17},
number = {4},
issn = {2150-8097},
url = {https://doi.org/10.14778/3636218.3636225},
doi = {10.14778/3636218.3636225},
journal = {Proc. VLDB Endow.},
month = {mar},
pages = {685–698},
numpages = {14}
}
                    

Overall Leaderboard - Execution Accuracy

Rank Model Code Dev
(%)
Test
(%)
1
Aug 20, 2023
T5-Large link 33% 23%
2
Aug 20, 2023
SmBoP+GraPPa link 24% 10%
3
Aug 20, 2023
ValueNet link 35% 9%

CORDIS Leaderboard - Execution Accuracy

Rank Model Code Dev
(%)
Test
(%)
1
Aug 20, 2023
T5-Large link 29% 14%
2
Aug 20, 2023
SmBoP+GraPPa link 21% 7%
3
Aug 20, 2023
ValueNet link 35% 6%

SDSS Leaderboard - Execution Accuracy

Rank Model Code Dev
(%)
Test
(%)
1
Aug 20, 2023
T5-Large link 15% 30%
2
Aug 20, 2023
ValueNet link 21% 14%
3
Aug 20, 2023
SmBoP+GraPPa link 15% 10%

OncoMx Leaderboard - Execution Accuracy

Rank Model Code Dev
(%)
Test
(%)
1
Aug 20, 2023
T5-Large link 56% 24%
2
Aug 20, 2023
SmBoP+GraPPa link 35% 13%
3
Aug 20, 2023
ValueNet link 49% 8%