LEADERBOARD
Comparing optimization strategies across chemical datasets
How to Read the Data
Table View: Values show the median or mean performance. Hover over any value to see the full range (min–max) and number of runs.
Plot View: Horizontal lines indicate median/mean values (colored by method type). Red circles (●) mark minimum values, green triangles (▲) mark maximum values, with thin gray lines connecting the range.
Views: Toggle between Median (Min-Max) for robust central tendency or Mean (Min-Max) for arithmetic average.
Chemical reaction datasets
Public or personal view
Performance Visualization
Generating Performance Visualization
This may take a few moments as we compile optimization results and generate interactive plots...
Tip: You can browse the leaderboard table below while plots are being generated!
| Method |
Pass@3
(Early)
|
Pass@5
(Early-Mid)
|
Pass@10
(Mid)
|
Pass@20
(Final)
|
|---|---|---|---|---|
|
Showing:
Median (Min-Max)
|
||||
|
1
BO
atlas-ei
|
60.0% ± 21.8% (20) | 66.6% ± 17.3% (20) | 78.5% ± 12.4% (20) | 82.2% ± 12.1% (20) |
|
2
BO
atlas-pi
|
59.9% ± 12.7% (20) | 61.5% ± 13.9% (20) | 74.4% ± 10.2% (20) | 77.3% ± 11.4% (20) |
|
3
BO
atlas-ucb
|
57.6% ± 24.4% (20) | 67.8% ± 23.2% (20) | 80.1% ± 12.5% (20) | 87.5% ± 8.1% (20) |
|
LLM
claude-3-5-haiku-latest
|
57.4% ± 15.1% (20) | 67.0% ± 15.3% (20) | 78.3% ± 11.8% (20) | 82.5% ± 13.0% (20) |
|
LLM
claude-3-7-sonnet-latest
|
54.1% ± 5.8% (20) | 64.5% ± 14.3% (20) | 89.2% ± 9.2% (20) | 95.6% ± 2.5% (20) |
|
LLM
claude-3-7-sonnet-latest-thinking
|
55.0% ± 7.3% (20) | 61.5% ± 14.4% (20) | 77.6% ± 18.9% (20) | 95.5% ± 2.0% (20) |
|
LLM
gpt-4o-mini
|
55.7% ± 20.3% (20) | 60.7% ± 22.5% (20) | 65.8% ± 23.3% (20) | 65.8% ± 23.3% (20) |
|
LLM
gemini-2.5-pro-preview-03-25-medium
|
52.4% ± 6.1% (20) | 55.3% ± 6.7% (20) | 60.2% ± 11.8% (20) | 85.9% ± 13.8% (20) |
|
LLM
gpt-4o
|
52.2% ± 18.5% (20) | 61.4% ± 21.6% (20) | 69.5% ± 23.3% (20) | 71.8% ± 21.2% (20) |
|
LLM
gemini-2.5-flash-preview-04-17-medium
|
50.6% ± 12.0% (20) | 53.6% ± 8.4% (20) | 65.5% ± 17.0% (20) | 86.3% ± 15.9% (20) |
Currently showing top 10 of 25 results