Cafiero, M.
ORCID: https://orcid.org/0000-0002-4895-1783
(2025)
Variable-temperature token sampling in decoder-GPT molecule-generation can produce more robust and potent virtual screening libraries.
Physical Chemistry Chemical Physics, 27 (27).
pp. 14455-14468.
ISSN 1463-9076
doi: 10.1039/D5CP00692A
Abstract/Summary
Token generation in generative pretrained transformers (GPTs) that produce text, code, or molecules often use conventional approaches such as greedy decoding, temperature-based sampling, or top-k or top-p techniques. This work shows that for a model trained to generate inhibitors of the enzyme HMG-Coenzyme-A reductase, a variable temperature approach using a temperature ramp during the inference process produces larger sets of molecules (screening libraries) than those produced by either greedy decoding or single-temperature-based sampling. These libraries also have lower predicted IC50 values, lower docking scores, and lower synthetic accessibility scores than libraries produced by the other sampling techniques, especially when used with very short prompt-lengths. This work explores several variable-temperature schemes when generating molecules with a GPT and recommends a sigmoidal temperature ramp early in the generation process.
Altmetric Badge
| Item Type | Article |
| URI | https://centaur.reading.ac.uk/id/eprint/123336 |
| Identification Number/DOI | 10.1039/D5CP00692A |
| Refereed | Yes |
| Divisions | Life Sciences > School of Chemistry, Food and Pharmacy > Department of Chemistry |
| Publisher | Royal Society of Chemistry |
| Download/View statistics | View download statistics for this item |
Downloads
Downloads per month over past year
University Staff: Request a correction | Centaur Editors: Update this record
Download
Download