Isomer C7H8N2O2 (using TDC)#
About#
This objective function optimizes towards a given SMILES/SELFIES that has a molecular formula of C7H8N2O2, and is part of the Isomers family of GuacaMol [Brown et al., 2019] benchmark. We compute it using the Therapeutics Data Common’s oracle [Huang et al., 2021].
Warning
There is currently a discrepancy between what TDC reports in their documentation, and the values we get when we run this oracle. Read more here.
Prerequisites#
None. This black box should run out-of-the-box.
How to run#
import numpy as np
from poli.objective_repository import (
IsomerC7H8N2O2ProblemFactory,
IsomerC7H8N2O2BlackBox,
)
# Creating the black box
f = IsomerC7H8N2O2BlackBox(
string_representation="SMILES" # SMILES by default, can be SELFIES
)
# Creating a problem
problem = IsomerC7H8N2O2ProblemFactory().create()
f, x0 = problem.black_box, problem.x0
# Example input: (taken from the TDC)
x = np.array(["CC(C)(C)[C@H]1CCc2c(sc(NC(=O)COc3ccc(Cl)cc3)c2C(N)=O)C1"])
# Querying:
y = f(x)
print(y) # Should be close to [[2.19875911e-34]]
How to cite#
If you use this black box, we expect that you cite the following resources:
[1] Brown, Nathan, Marco Fiscato, Marwin H.S. Segler, and Alain C. Vaucher. “GuacaMol: Benchmarking Models for de Novo Molecular Design.” Journal of Chemical Information and Modeling 59, no. 3 (March 25, 2019): 1096–1108. https://doi.org/10.1021/acs.jcim.8b00839.
[2] Huang, Kexin, Tianfan Fu, Wenhao Gao, Yue Zhao, Yusuf Roohani, Jure Leskovec, Connor W Coley, Cao Xiao, Jimeng Sun, and Marinka Zitnik. “Therapeutics Data Commons: Machine Learning Datasets and Tasks for Drug Discovery and Development.” Proceedings of Neural Information Processing Systems, NeurIPS Datasets and Benchmarks, 2021.
[3] González-Duque, M., Bartels, S., & Michael, R. (2024). poli: a libary of discrete sequence objectives [Computer software]. MachineLearningLifeScience/poli
@article{Brown:guacamol:2019,
title={GuacaMol: Benchmarking Models for de Novo Molecular Design},
volume={59},
ISSN={1549-9596,
1549-960X},
DOI={10.1021/acs.jcim.8b00839},
number={3},
journal={Journal of Chemical Information and Modeling},
author={Brown, Nathan and Fiscato, Marco and Segler, Marwin H.S. and Vaucher, Alain C.},
year={2019},
month=mar,
pages={1096–1108},
language={en} }
@article{Huang:TDC:2021,
title={Therapeutics Data Commons: Machine Learning Datasets and Tasks for Drug Discovery and Development},
author={Huang, Kexin and Fu, Tianfan and Gao, Wenhao and Zhao, Yue and Roohani, Yusuf and Leskovec, Jure and Coley,
Connor W and Xiao, Cao and Sun, Jimeng and Zitnik, Marinka},
journal={Proceedings of Neural Information Processing Systems, NeurIPS Datasets and Benchmarks},
year={2021}
}
@software{Gonzalez-Duque:poli:2024,
author = {González-Duque, Miguel and Bartels, Simon and Michael, Richard},
month = jan,
title = {{poli: a libary of discrete sequence objectives}},
url = {https://github.com/MachineLearningLifeScience/poli},
version = {0.0.1},
year = {2024}
}