Accessibility navigation


Transformer-decoder GPT models for generating virtual screening libraries of HMG-Coenzyme A reductase inhibitors: effects of temperature, prompt-length and transfer-learning strategies

Cafiero, M. ORCID: https://orcid.org/0000-0002-4895-1783 (2024) Transformer-decoder GPT models for generating virtual screening libraries of HMG-Coenzyme A reductase inhibitors: effects of temperature, prompt-length and transfer-learning strategies. Journal of Chemical Information and Modeling. ISSN 1549-960X

[img]
Preview
Text (Open Access) - Published Version
· Available under License Creative Commons Attribution.
· Please see our End User Agreement before downloading.

4MB
[img] Text - Accepted Version
· Restricted to Repository staff only

1MB

It is advisable to refer to the publisher's version if you intend to cite from this work. See Guidance on citing.

To link to this item DOI: 10.1021/acs.jcim.4c01309

Abstract/Summary

Attention-based decoder models were used to generate libraries of novel inhibitors for the HMG-Coenzyme A reductase (HMGCR) enzyme. These deep neural network models were pre-trained on previously synthesized drug-like molecules from the ZINC15 database to learn the syntax of SMILES strings, and then fine-tuned with a set of ~1,000 molecules that inhibit HMGCR. The numbers of layers used for pre-training and fine-tuning were varied to find the optimal balance for robust library generation. Virtual screening libraries were also generated with different temperatures and numbers of input tokens (prompt-length) to find the most desirable molecular properties. The resulting libraries were screened against several criteria, including: IC50 values predicted by a Dense Neural Network (DNN) trained on experimental HMGCR IC50 values, docking scores from AutoDock Vina (via Dockstring), a calculated Quantitative Estimate of Druglikeness (QED), and Tanimoto similarity to known HMGCR inhibitors. It was found that 50/50 or 25/75% pre-trained/fine-tuned models with a non-zero temperature and shorter prompt-lengths produced the most robust libraries, and the DNN-predicted IC50 values had good correlation with docking scores and statin-similarity. 42% of generated molecules were classified as statin-like by k-means clustering, with the rosuvastatin-like group having the lowest IC50 values and lowest docking scores.

Item Type:Article
Refereed:Yes
Divisions:Life Sciences > School of Chemistry, Food and Pharmacy > Department of Chemistry
ID Code:119218
Publisher:American Chemical Society

Downloads

Downloads per month over past year

University Staff: Request a correction | Centaur Editors: Update this record

Page navigation