About ScienceBenchmark
We introduce ScienceBenchmark, a new complex NL-to-SQL benchmark for three real-world, highly domain-specific databases: Research Policy Making, Astrophysics, and Cancer research. For this new benchmark, SQL experts and domain experts created high-quality NL/SQL-pairs for each domain. To our knowledge, ScienceBenchmark is the first NL-to-SQL benchmark designed with complex real-world scientific databases, containing challenging training and test data carefully validated by domain experts.
ScienceBenchmark Statistics
Tbls | Cols | Rows | Avg R/T | Size(GB) | #NL/SQL (man+syn) | |
---|---|---|---|---|---|---|
CORDIS | 19 | 82 | 671K | 35K | 1 | 200 + 1306 |
SDSS | 6 | 61 | 86M | 14M | 6.1 | 200 + 2061 |
OncoMX | 25 | 106 | 65M | 2.6M | 12 | 200 + 1065 |
Submission
Please follow the Submission Guideline (below) and contact sciencebenchmark@init.lists.zhaw.ch for test evaluation.
Submission Guideline
In order to preserve the integrity of ScienceBenchmark, the test set is not publicly available. To submit your model for evaluation on the dev set and hidden test set please submit your model according to the following tutorial: Coming Soon!
Citation
@article{10.14778/3636218.3636225,
author = {Zhang, Yi and Deriu, Jan and Katsogiannis-Meimarakis, George and Kosten, Catherine and Koutrika, Georgia and Stockinger, Kurt},
title = {ScienceBenchmark: A Complex Real-World Benchmark for Evaluating Natural Language to SQL Systems},
year = {2024},
issue_date = {December 2023},
publisher = {VLDB Endowment},
volume = {17},
number = {4},
issn = {2150-8097},
url = {https://doi.org/10.14778/3636218.3636225},
doi = {10.14778/3636218.3636225},
journal = {Proc. VLDB Endow.},
month = {mar},
pages = {685–698},
numpages = {14}
}
Overall Leaderboard - Execution Accuracy
CORDIS Leaderboard - Execution Accuracy
Rank | Model | Code | Dev (%) |
Test (%) |
---|---|---|---|---|
1
Aug 20, 2023 |
T5-Large | link | 29% | 14% |
2
Aug 20, 2023 |
SmBoP+GraPPa | link | 21% | 7% |
3
Aug 20, 2023 |
ValueNet | link | 35% | 6% |
SDSS Leaderboard - Execution Accuracy
Rank | Model | Code | Dev (%) |
Test (%) |
---|---|---|---|---|
1
Aug 20, 2023 |
T5-Large | link | 15% | 30% |
2
Aug 20, 2023 |
ValueNet | link | 21% | 14% |
3 Aug 20, 2023 |
SmBoP+GraPPa | link | 15% | 10% |
OncoMx Leaderboard - Execution Accuracy
Rank | Model | Code | Dev (%) |
Test (%) |
---|---|---|---|---|
1
Aug 20, 2023 |
T5-Large | link | 56% | 24% |
2
Aug 20, 2023 |
SmBoP+GraPPa | link | 35% | 13% |
3
Aug 20, 2023 |
ValueNet | link | 49% | 8% |