Transformer-decoder GPT models for generating virtual screening libraries of HMG-Coenzyme A reductase inhibitors: effects of temperature, prompt-length and transfer-learning strategies
Cafiero, M.
It is advisable to refer to the publisher's version if you intend to cite from this work. See Guidance on citing. To link to this item DOI: 10.1021/acs.jcim.4c01309 Abstract/SummaryAttention-based decoder models were used to generate libraries of novel inhibitors for the HMG-Coenzyme A reductase (HMGCR) enzyme. These deep neural network models were pre-trained on previously synthesized drug-like molecules from the ZINC15 database to learn the syntax of SMILES strings, and then fine-tuned with a set of ~1,000 molecules that inhibit HMGCR. The numbers of layers used for pre-training and fine-tuning were varied to find the optimal balance for robust library generation. Virtual screening libraries were also generated with different temperatures and numbers of input tokens (prompt-length) to find the most desirable molecular properties. The resulting libraries were screened against several criteria, including: IC50 values predicted by a Dense Neural Network (DNN) trained on experimental HMGCR IC50 values, docking scores from AutoDock Vina (via Dockstring), a calculated Quantitative Estimate of Druglikeness (QED), and Tanimoto similarity to known HMGCR inhibitors. It was found that 50/50 or 25/75% pre-trained/fine-tuned models with a non-zero temperature and shorter prompt-lengths produced the most robust libraries, and the DNN-predicted IC50 values had good correlation with docking scores and statin-similarity. 42% of generated molecules were classified as statin-like by k-means clustering, with the rosuvastatin-like group having the lowest IC50 values and lowest docking scores.
Download Statistics DownloadsDownloads per month over past year ![]() Altmetric Deposit Details University Staff: Request a correction | Centaur Editors: Update this record |