Loading Datasets and Models...
datasets
The full dataset definitions in Olympus can be found here.
The datasets are named dataset_{name}, where the {name} is the name of the dataset (Ex. dataset_Buchwald_Hartwig).
Buchwald Hartwig (size = 4600)
›Palladium-catalyzed Buchwald-Hartwig reactions.
Reaction Setup:
Solutions were prepared in DMSO: catalyst (0.05 M), aryl halide (0.5 M), toluidine (0.5 M), additive (0.5 M), and base (0.75 M).
These solutions were added to a 384-well source plate (80 μL per well). The Mosquito HTS liquid handling robot was used to dose each
of these solutions (200 nL each) into a 1536-well plate. The plate was sealed and heated to 60 °C for 16 hours. The plate was then
opened and the Mosquito was used to add internal standard to each well (3 μL of 0.0025 M di-tert-butylbiphenyl solution in DMSO).
At that point, aliquots were sampled into 384-well plates and analyzed by UPLC which was used to quantify product yield.
Dataset source:
Predicting reaction performance in C–N cross-coupling using machine learning
DOI: https://doi.org/10.1126/science.aar5169
Parameters:
- • base (Categorical)
- • ligand (Categorical)
- • aryl_halide (Categorical)
- • additive (Categorical)
Objectives:
- • yield (Continuous)
Objective Distributions:
yield
Suzuki Doyle (size = 3696)
›Suzuki-Miyaura Coupling reactions.
Reaction Setup:
...
Dataset source:
A platform for automated nanomole-scale reaction screening and micromole-scale synthesis in flow
DOI: https://doi.org/10.1126/science.aap9112
Bayesian reaction optimization as a tool for chemical synthesis
DOI: https://doi.org/10.1038/s41586-021-03213-y
Parameters:
- • electrophile (Categorical)
- • nucleophile (Categorical)
- • base (Categorical)
- • ligand (Categorical)
- • solvent (Categorical)
Objectives:
- • yield (Continuous)
Objective Distributions:
yield
Reductive Amination (size = 768)
›Reductive amination of staurosporine.
Reaction Setup:
SPT Labtech Mosquito®HTS liquid handling robot was used to generate a reaction plate containing all possible reaction
conditions per aldehyde building block. 2 solvents x 2 concentrations x 3 AcOH loadings x 4 TTIP loadings = 48 reaction
conditions per aldehyde building block.
48 reaction conditions x 16 building blocks = 768 reactions.
The reactions were mixed (core, aldehyde/ketone, acetic acid, titanium tetraisopropoxide)
and allowed to sit for 4 hours before addition of STAB to the reaction mixtures.
After 18 hours, reactions were quenched and examined. 3x1 µL 0.5 M AcOH in DMSO
(containing PhPh internal standard) quench solution was added to each reaction well.
1 µL of each quenched reaction (4 µL total volume) was transferred from the 1,536 plate
reaction well to a 384-well analytical plate containing 60 µL DMSO per well.
Results of this screen are presented in Figure 3 as a heat map arrayed by reaction conditions after quenching.
Reactions are represented by relative percent conversion, calculated from UV peak area of product and staurosporine core.
Dataset source:
Miniturization of popular reactions from medicial chemists' toolbox for ultrahigh-throughput experimentation
DOI: https://doi.org/10.1038/s44160-023-00351-1
Parameters:
- • substrate (Categorical)
- • AcOH_equiv (Discrete)
- • TTIP_equiv (Discrete)
- • solvent (Categorical)
- • reaction_concentration_mM (Discrete)
Objectives:
- • percent_conversion (Continuous)
Objective Distributions:
percent_conversion
Suzuki Cernak (size = 960)
›Suzuki Cross-Coupling Informer Library.
Reaction Setup:
SPT Labtech Mosquito® HTS liquid handling robot used to generate reaction plate containing the 12 reaction conditions
described above per aryl halide core/boronic acid building block combination. 12 reaction conditions were tested to
produce a maximum of 120 possible products, 12 cores 10 boronate building blocks. 12 cores x 10 boronate building
blocks x 12 conditions = 1,440 reaction screen.
Results of this screen are presented in Figure 2 as a heatmap arrayed by reaction conditions.
Reactions are represented by product peak area versus internal standard (PhPh) peak area ratio.
Dataset source:
Miniturization of popular reactions from medicial chemists' toolbox for ultrahigh-throughput experimentation
DOI: https://doi.org/10.1038/s44160-023-00351-1
Parameters:
- • halide (Categorical)
- • boronate (Categorical)
- • conditions (Categorical)
Objectives:
- • conversion (Continuous)
Objective Distributions:
conversion
Alkylation Deprotection (size = 96)
›Nanomole-Scale N-Alkylation/Deprotection Library Synthesis.
Reaction Setup:
Two cores were tested in nanoscale alkylation-deprotection (0.4 M stock solutions in DMF)
Twelve alkyl, allylic, and benzylic halides used in this study.
Four unique alkylation conditions were screened.
One Boc deprotection condition (20 equiv. H2SO4 in diglyme) was used.
and nanoliter pipetting was done under an N2 atmosphere in a glove box. Stock solutions of the cores (0.4
M in DMF), electrophile building blocks (1 M in DMF), and bases (1.6 M in DMF), and deprotection acid (H2SO4 in diglyme,
2 M) were prepared in the glove box. A source plate was constructed using the reagent stock solutions by pipetting 16-60 µL of the correct component stock into the wells. The SPT Labtech
Mosquito® was used to assemble the reaction mixtures, utilizing the multi-aspiration feature and dispense mixing.
The next morning, the reactions were quenched by addition of saturated aqueous NaHCO3 (2 µL) and diluted with water
(1 µL) and DMSO (2 µL). The reactions were further diluted to 10 mM using DMSO and liquid handling. 1 µL of the quenched
and diluted crude reactions were drawn up for analysis by LC-MS (1 µL crude ~10 mM reaction into 60 µL DMSO followed
by shaking). Screening results are represented as a heat map of percent yield, where values were calculated using standard
curves created from isolated products.
NOTE: elevated yields (above 100% as determined by standard curves) shown for some reactions in may be due to inaccuracies in liquid handling, causing more of the core and reagents to be charged into the reaction drop than expected.
Dataset source:
Miniturization of popular reactions from medicial chemists' toolbox for ultrahigh-throughput experimentation
DOI: https://doi.org/10.1038/s44160-023-00351-1
Parameters:
- • electrophile (Categorical)
- • core (Categorical)
- • base (Categorical)
Objectives:
- • yield (Continuous)
Objective Distributions:
yield
Chan Lam Full (size = 5684)
›Name: Chan-Lam coupling of primary sulfonamides with boronic acids
Description: Screen of Chan-Lam couplings of primary sulfonamides with boronic acids, from 10.26434/chemrxiv-2024-22jrq, by the Abigail Doyle group in collaboration with Janssen. Includes a triplicate HTE screen spanning 44 sulfonamides, 2 boronic acids, 4 copper catalysts, 21 bases, and 4 solvents at 60 °C for 18 hours under an air atmosphere.
Provenance: experimenter {
name: "Shivaani Gandhi"
orcid: "0000-0003-1825-5450"
organization: "Princeton University"
email: "shivaani@princeton.edu"
}
city: "Princeton, New Jersey"
doi: "10.26434/chemrxiv-2024-22jrq"
publication_url: "https://doi.org/10.26434/chemrxiv-2024-22jrq"
record_created {
time {
value: "08/05/2024, 11:33:06"
}
person {
name: "Giselle Brown"
orcid: "0009-0009-6387-7364"
organization: "University of California, Los Angeles"
email: "gisellebrown@ucla.edu"
}
details: "ORD data entry"
}
record_modified {
time {
value: "08/05/2024, 11:34:02"
}
person {
name: "Jordan S. Compton"
orcid: "0000-0001-7099-3456"
organization: "Chemistry Capabilities, Analytical and Purification, Global Discovery ChemistryJanssen Research and Development LLC, Spring House, PA, 19477, USA"
email: "jcompto4@its.jnj.com"
}
details: "Additional Experimenter"
}
record_modified {
time {
value: "08/05/2024, 11:34:45"
}
person {
name: "Iulia I. Strambeanu"
orcid: "0000-0002-1502-5484"
organization: "Chemistry Capabilities, Analytical and Purification, Global Discovery ChemistryJanssen Research and Development LLC, Spring House, PA, 19477, USA"
email: "istrambe@its.jnj.com"
}
details: "Additional Experimenter"
}
record_modified {
time {
value: "Sat Nov 16 18:12:57 2024"
}
person {
username: "github-actions"
email: "github-actions@github.com"
}
details: "Automatic updates from the submission pipeline."
}
Notes: is_sensitive_to_moisture: false
is_sensitive_to_oxygen: false
procedure_details: "Reactions were run in 8 x 30 mm glass vial inserts in 96 well-plate Para-dox Aluminum Reaction Blocks. The reaction components were dosed according to the design shown in Figure S2 and Figure S3. First, the catalysts (2 umol per vial) and solid bases (20 umol per vial) were added by dosing 50 uL each of a stock solution in 1,2-dichloroethane (40 mM for catalysts, 0.4 M for bases) via single-channel pipette. The 1,2-dichloroethane was then removed via centrifugal evaporation using a Genevac EZ-2 evaporator (Scientific Products)(method: low boiling point, maximum temperature of 30 C, 10-30 minutes). Parylene-coated stir bars (1.98 x 4.80 mm) were loaded into each pre-dosed vial using a stir bar dispenser (V&P Scientific, catalog number VP 711A-96-1). Stock solutions of sulfonamide and boronic acid (0.1 M in sulfonamide, 0.15 M in boronic acid) were prepared in the respective reaction solvent (MeCN, MeOH, DCE, or EtOAc) and 100 uL were dosed into the reaction plate according to the design. Lastly, liquid bases were dispensed manually via single-channel pipette. The reaction block was sealed and the contents were tumble-stirred at ~600 rpm (3-Position Magnetic Tumble Stirrer, U.S. Series, V&P Scientific, Inc.) for 18 hours at 60 C. After the reaction time had elapsed, the plate was removed from the heat. Upon cooling, 500 uL of biphenyl stock solution (0.05 M in MeCN, 0.25 equiv) was added to each well as external standard. The plate was sealed, inverted 3x, and centrifuged for 5 minutes to partition all solids to the bottom of the wells. Taking care not to agitate the solution, 25 uL from each well was sampled via multi-channel pipettor and dispensed into a Thermo Scientific(TM) 96-well 1 mL polypropylene plate (SKU 278743) for analysis. Each well was diluted with 750 uL MeCN and sealed using a PlateLoc Thermal Microplate Sealer, then subjected to UPLC-MS analysis."
Setup: vessel {
type: WELL_PLATE
details: "96-well Para-dox Aluminium Reaction Block"
material {
type: GLASS
details: "8 x 30 mm glass vial inserts"
}
attachments {
type: MAT
details: "Para-dox plate sealed"
}
volume {
value: 1
units: MILLILITER
}
}
is_automated: false
environment {
type: FUME_HOOD
details: "HTE"
}
Conditions: temperature {
control {
type: DRY_ALUMINUM_PLATE
details: "3-Position Magnetic Tumble Stirrer, U.S. Series, V&P Scientific, Inc."
}
setpoint {
value: 60
units: CELSIUS
}
}
pressure {
control {
type: AMBIENT
}
atmosphere {
type: AIR
}
}
stirring {
type: STIR_BAR
details: "Parylene-coated stir bar 1.98 x 4.80 mm"
rate {
rpm: 600
}
}
Workups: [type: ADDITION
input {
components {
identifiers {
type: NAME
value: "biphenyl"
}
identifiers {
type: SMILES
details: "NAME resolved by the PubChem API"
value: "c1ccc(-c2ccccc2)cc1"
}
amount {
moles {
value: 5
units: MILLIMOLE
}
}
reaction_role: INTERNAL_STANDARD
}
components {
identifiers {
type: NAME
value: "MeCN"
}
identifiers {
type: SMILES
details: "NAME resolved by the PubChem API"
value: "CC#N"
}
amount {
volume {
value: 500
units: MICROLITER
}
volume_includes_solutes: true
}
reaction_role: SOLVENT
}
addition_device {
type: PIPETTE
details: "multi-channel pipette"
}
}
stirring {
type: CUSTOM
details: "Sealed plate and inverted 3x to mix. "
}
is_automated: false
, type: CUSTOM
details: "Centrifugation for 5 min to partition all solids to the bottom of the wells."
keep_phase: "liquid"
, type: ALIQUOT
details: "25 uL sampled from each well by multi-channel pipette and dispensed into a Thermo ScientificTM96-well 1 mL polypropylene plate (SKU 278743) for analysis."
amount {
volume {
value: 25
units: MICROLITER
}
}
, type: ADDITION
details: "Each well in the analysis plate was diluted with 750 uL MeCN and sealed using a PlateLoc Thermal Microplate Sealer, then subjected to UPLC-MS analysis."
input {
components {
identifiers {
type: NAME
value: "MeCN"
}
identifiers {
type: SMILES
details: "NAME resolved by the PubChem API"
value: "CC#N"
}
amount {
volume {
value: 750
units: MICROLITER
}
volume_includes_solutes: false
}
reaction_role: SOLVENT
}
addition_device {
type: PIPETTE
details: "multi-channel pipette"
}
}
is_automated: false
]
Parameters:
- • boronic_acid_reactant (Categorical)
- • sulfonamide_reactant (Categorical)
- • catalyst_catalyst (Categorical)
- • base_reagent (Categorical)
- • solvent (Categorical)
Objectives:
- • desired_yield (Continuous)
- • undesired_yield (Continuous)
Objective Distributions:
desired_yield
undesired_yield
amide coupling hte (size = 648)
›Amide Coupling HTE Dataset: 3×3×72 Fully Mapped Subset
Dataset Overview:
This dataset represents a FULLY MAPPED subset extracted from a large-scale high-throughput
experimentation (HTE) campaign for amide coupling reactions. It contains 780 experiments
(with some replicates) covering all combinations of:
- 3 structurally diverse amines
- 3 structurally diverse carboxylic acids
- 72 curated reaction conditions
This is a COMPLETE FACTORIAL design - every amine × acid × condition combination has been
tested, making it ideal for optimization algorithm benchmarking without the need for
surrogate modeling.
Substrate Scope:
Amines:
1. CC(=O)c1ccc(N)cc1 (4-aminoacetophenone)
2. Cc1ccc(CN)cc1 (4-methylbenzylamine)
3. Cc1ccc2cccc(N)c2n1 (8-aminoquinoline derivative)
Carboxylic Acids:
1. O=C(O)Cc1ccc2c(c1)OCO2 (benzodioxole acetic acid)
2. O=C(O)COc1ccccc1 (phenoxyacetic acid)
3. O=C(O)Cc1ccc2c(c1)C(=O)c1ccccc1CO2 (anthraquinone derivative)
Reaction Conditions:
The 72 conditions represent carefully curated combinations from a larger space of:
- 42 activation reagents (R1-R42): DCC, EDC, HATU, HBTU, PyBOP, TBTU, and others
- 11 additives (A1-A11): HOBt, HOAt, Oxyma, and variants
- 8 bases (B1-B8): DIPEA, TEA, NMM, DMAP, and others
- 1 solvent (S1): DMF (dimethylformamide)
Each condition is encoded as a string: "R##_A##_B##_S##"
Examples: "R2_A5_B1_S1", "R25_A1_B7_S1", etc.
These 72 conditions represent a strategically selected subset (~3% coverage) of the full
3,696 possible combinations (42×11×8×1), chosen based on chemical compatibility, literature
precedent, and preliminary screening results.
NOTE: Conditions are pre-packaged combinations encoded as strings (e.g., "R25_A1_B7_S1").
The encoding represents:
R## = Activation reagent (coupling agent)
A## = Additive
B## = Base
S## = Solvent (always S1 = DMF in this subset)
Due to the complexity of ionic salts and multi-component reagents in the original data,
conditions should be treated as opaque categorical identifiers rather than attempting to
independently decode each component. Users cannot independently vary individual components
(activation reagent, additive, base) within this dataset.
The 72 conditions use combinations of the following reagents.
Each condition is encoded as "R##_A##_B##_S##" where:
R## = Activation/Coupling Reagent
A## = Additive (A1 = no additive)
B## = Base (B1 = no base)
S## = Solvent (always S1 = DMF)
All reagent identities below are verified against the Open Reaction Database
(Dataset ID: ord-47eaacc46c3a4487bbdf99adb1a15e41).
ACTIVATION/COUPLING REAGENTS (R##) - 35 used in this dataset
R2 - DIC
SMILES: CC(C)N=C=NC(C)C
R4 - CDI
SMILES: C1=CN(C=N1)C(=O)N2C=CN=C2
R6 - CDMT
SMILES: COC1=NC(=NC(=N1)Cl)OC
R7 - 2,4-Dichloro-6-methoxy-1,3,5-triazine
SMILES: COC1=NC(=NC(=N1)Cl)Cl
R8 - EDC-HCl
SMILES: CCN=C=NCCCN(C)C.Cl
R9 - DCC
SMILES: C1CCC(CC1)N=C=NC2CCCCC2
R11 - EEDQ
SMILES: CCOC1C=CC2=CC=CC=C2N1C(=O)OCC
R12 - BOP-Cl
SMILES: C1COC(=O)N1P(=O)(N2CCOC2=O)Cl
R13 - TFFH
SMILES: CN(C)C(=[N+](C)C)F.F[P-](F)(F)(F)(F)F
R17 - TPTU
SMILES: [B-](F)(F)(F)F.CN(C)C(=[N+](C)C)ON1C=CC=CC1=O
R18 - DEPBT
SMILES: CCOP(=O)(OCC)ON1C(=O)C2=CC=CC=C2N=N1
R19 - TSTU
SMILES: [B-](F)(F)(F)F.CN(C)C(=[N+](C)C)ON1C(=O)CCC1=O
R20 - IIDQ
SMILES: CC(C)COC1C=CC2=CC=CC=C2N1C(=O)OCC(C)C
R21 - BTFFH
SMILES: C1CCN(C1)C(=[N+]2CCCC2)F.F[P-](F)(F)(F)(F)F
R22 - TBTU
SMILES: [B-](F)(F)(F)F.CN(C)C(=[N+](C)C)ON1C2=CC=CC=C2N=N1
R23 - TOTU
SMILES: [B-](F)(F)(F)F.CCOC(=O)C(=NOC(=[N+](C)C)N(C)C)C#N
R24 - PyCIU
SMILES: C1CCN(C1)C(=[N+]2CCCC2)Cl.F[P-](F)(F)(F)(F)F
R25 - TDBTU
SMILES: [B-](F)(F)(F)F.CN(C)C(=[N+](C)C)ON1C(=O)C2=CC=CC=C2N=N1
R26 - TCTU
SMILES: [B-](F)(F)(F)F.CN(C)C(=[N+](C)C)ON1C2=C(C=CC(=C2)Cl)N=N1
R27 - TNTU
SMILES: [B-](F)(F)(F)F.CN(C)C(=[N+](C)C)ON1C(=O)C2C3CC(C2C1=O)C=C3
R28 - HBTU
SMILES: CN(C)C(=[N+](C)C)ON1C2=CC=CC=C2N=N1.F[P-](F)(F)(F)(F)F
R29 - HATU
SMILES: CN(C)C(=[N+](C)C)ON1C2=C(C=CC=N2)N=N1.F[P-](F)(F)(F)(F)F
R30 - FDPP
SMILES: C1=CC=C(C=C1)P(=O)(C2=CC=CC=C2)OC3=C(C(=C(C(=C3F)F)F)F)F
R31 - HCTU
SMILES: CN(C)C(=[N+](C)C)ON1C2=C(C=CC(=C2)Cl)N=N1.F[P-](F)(F)(F)(F)F
R32 - PyCloP
SMILES: C1CCN(C1)[P+](N2CCCC2)(N3CCCC3)Cl.F[P-](F)(F)(F)(F)F
R33 - COMU
SMILES: CCOC(=O)C(=NOC(=[N+](C)C)N1CCOCC1)C#N.F[P-](F)(F)(F)(F)F
R34 - HBPYU
SMILES: C1CCN(C1)C(=[N+]2CCCC2)ON3C4=CC=CC=C4N=N3.F[P-](F)(F)(F)(F)F
R35 - BOP
SMILES: CN(C)[P+](N(C)C)(N(C)C)ON1C2=CC=CC=C2N=N1.F[P-](F)(F)(F)(F)F
R36 - HDMC
SMILES: C[N+](=C(N1CCOCC1)N2C3=C(C=C(C=C3)Cl)[N+](=N2)[O-])C.F[P-](F)(F)(F)(F)F
R37 - PyBrOP
SMILES: C1CCN(C1)[P+](N2CCCC2)(N3CCCC3)Br.F[P-](F)(F)(F)(F)F
R38 - PyBOP
SMILES: C1CCN(C1)[P+](N2CCCC2)(N3CCCC3)ON4C5=CC=CC=C5N=N4.F[P-](F)(F)(F)(F)F
R39 - PyAOP
SMILES: C1CCN(C1)[P+](N2CCCC2)(N3CCCC3)ON4C5=C(C=CC=N5)N=N4.F[P-](F)(F)(F)(F)F
R40 - PyOxim
SMILES: CCOC(=O)C(=NO[P+](N1CCCC1)(N2CCCC2)N3CCCC3)C#N.F[P-](F)(F)(F)(F)F
R41 - CITU
SMILES: CN(C)C(=[N+](C)C)ON1C(=O)C2=C(C1=O)C(=C(C(=C2Cl)Cl)Cl)Cl.F[P-](F)(F)(F)(F)F
R42 - PyClocK
SMILES: C1CCN(C1)[P+](N2CCCC2)(N3CCCC3)ON4C5=C(C=CC(=C5)Cl)N=N4.F[P-](F)(F)(F)(F)F
ADDITIVES (A##) - 10 used in this dataset
A1 - No additive
(Empty - some coupling reagents don't require additives)
A2 - N-Hydroxysuccinimide
SMILES: C1CC(=O)N(C1=O)O
A4 - HOBt
SMILES: C1=CC=C2C(=C1)N=NN2O
A5 - HOAt
SMILES: C1=CC2=C(N=C1)N(N=N2)O
A6 - Oxyma
SMILES: CCOC(=O)C(=NO)C#N
A7 - N-Hydroxyphthalimide
SMILES: C1=CC=C2C(=C1)C(=O)N(C2=O)O
A8 - 6-Cl-HOBT
SMILES: C1=CC2=C(C=C1Cl)N(N=N2)O
A9 - Pentafluorophenol
SMILES: C1(=C(C(=C(C(=C1F)F)F)F)F)O
A10 - Oxyma-B
SMILES: CN1C(=C(C(=O)N(C1=O)C)N=O)O
A11 - 2,4,5-Trichlorophenol
SMILES: Oc1cc(Cl)c(Cl)cc1Cl
BASES (B##) - 7 used in this dataset
B1 - No base
B3 - NMM
SMILES: CN1CCOCC1
B4 - TEA
SMILES: CCN(CC)CC
B5 - 2,6-Lutidine
SMILES: CC1=NC(=CC=C1)C
B6 - DMAP
SMILES: CN(C)C1=CC=NC=C1
B7 - DIPEA
SMILES: CCN(C(C)C)C(C)C
B8 - DBU
SMILES: C1CCC2=NCCCN2CC1
SOLVENTS (S##)
S1 - DMF (N,N-Dimethylformamide)
SMILES: CN(C)C=O
All reactions in this dataset use DMF as the solvent.
Example Condition Decoded:
R8_A4_B7_S1 means:
R8 = EDC-HCl (1-Ethyl-3-(3-dimethylaminopropyl)carbodiimide hydrochloride)
A4 = HOBt (Hydroxybenzotriazole)
B7 = DIPEA (N,N-Diisopropylethylamine)
S1 = DMF (N,N-Dimethylformamide)
R29_A1_B7_S1 means:
R29 = HATU (Hexafluorophosphate Azabenzotriazole Tetramethyl Uronium)
A1 = No additive (HATU has built-in activating functionality)
B7 = DIPEA
S1 = DMF
Experimental Details:
Platform: In-house automated HTE platform with liquid handling robots
Scale: Microscale reactions in well plates
Analysis: LCMS with internal standard
Yield determination: Fractional yields (0.0 to 1.0) based on product formation
Quality control: Automated data collection minimizes human error
Replication: Some conditions tested multiple times (780 total measurements for 648 unique combos)
Parameters:
- • amine (Categorical)
- • acid (Categorical)
- • condition (Categorical)
Objectives:
- • yield (Continuous)