DRD2 Docking (using TDC)#
About#
This objective function computes the docking score of a small molecule (provided as a SMILES/SELFIES string) to the Dopamine Type 2 receptor. The underlying black box is actually using a trained random forest for classification [Olivecrona et al., 2017]. We compute it using the Therapeutics Data Common’s oracle [Huang et al., 2021]. This objective function is part of the Practical molecular Optimization benchmark [Gao et al., 2022].
Prerequisites#
None. This black box should run out-of-the-box.
How to run#
import numpy as np
from poli.objective_repository import (
DRD2ProblemFactory,
DRD2BlackBox,
)
# Creating the black box
f = DRD2BlackBox(
string_representation="SMILES" # SMILES by default, can be SELFIES
)
# Creating a problem
problem = DRD2ProblemFactory().create()
f, x0 = problem.black_box, problem.x0
# Example input: (taken from the TDC)
x = np.array(["CC(C)(C)[C@H]1CCc2c(sc(NC(=O)COc3ccc(Cl)cc3)c2C(N)=O)C1"])
# Querying:
y = f(x)
print(y) # Should be close to [[0.001546]]
How to cite#
If you use this black box, we expect that you cite the following resources:
[1] Olivecrona, Marcus, Thomas Blaschke, Ola Engkvist, and Hongming Chen. “Molecular De-Novo Design through Deep Reinforcement Learning.” Journal of Cheminformatics 9, no. 1 (September 4, 2017): 48. https://doi.org/10.1186/s13321-017-0235-x.
[2] Huang, Kexin, Tianfan Fu, Wenhao Gao, Yue Zhao, Yusuf Roohani, Jure Leskovec, Connor W Coley, Cao Xiao, Jimeng Sun, and Marinka Zitnik. “Therapeutics Data Commons: Machine Learning Datasets and Tasks for Drug Discovery and Development.” Proceedings of Neural Information Processing Systems, NeurIPS Datasets and Benchmarks, 2021.
[3] Gao, Wenhao, Tianfan Fu, Jimeng Sun, and Connor W. Coley. “Sample Efficiency Matters: A Benchmark for Practical Molecular Optimization,” 2022. https://openreview.net/forum?id=yCZRdI0Y7G.
[3] González-Duque, M., Bartels, S., & Michael, R. (2024). poli: a libary of discrete sequence objectives [Computer software]. MachineLearningLifeScience/poli
@article{Olivecrona:DeNovoRL:2017,
title={Molecular de-novo design through deep reinforcement learning},
volume={9},
ISSN={1758-2946},
DOI={10.1186/s13321-017-0235-x},
number={1},
journal={Journal of Cheminformatics},
author={Olivecrona, Marcus and Blaschke, Thomas and Engkvist, Ola and Chen, Hongming},
year={2017},
month=sep,
pages={48},
language={en}
}
@article{Huang:TDC:2021,
title={Therapeutics Data Commons: Machine Learning Datasets and Tasks for Drug Discovery and Development},
author={Huang, Kexin and Fu, Tianfan and Gao, Wenhao and Zhao, Yue and Roohani, Yusuf and Leskovec, Jure and Coley,
Connor W and Xiao, Cao and Sun, Jimeng and Zitnik, Marinka},
journal={Proceedings of Neural Information Processing Systems, NeurIPS Datasets and Benchmarks},
year={2021}
}
@inproceedings{Gao:PMO:2022,
title={Sample Efficiency Matters: A Benchmark for Practical Molecular Optimization},
url={https://openreview.net/forum?id=yCZRdI0Y7G},
author={Gao, Wenhao and Fu, Tianfan and Sun, Jimeng and Coley, Connor W.},
year={2022},
month=jun,
language={en}
}
@software{Gonzalez-Duque:poli:2024,
author = {González-Duque, Miguel and Bartels, Simon and Michael, Richard},
month = jan,
title = {{poli: a libary of discrete sequence objectives}},
url = {https://github.com/MachineLearningLifeScience/poli},
version = {0.0.1},
year = {2024}
}