Variable-temperature token sampling in decoder-GPT molecule-generation can produce more robust and potent virtual screening libraries
Cafiero, M.
It is advisable to refer to the publisher's version if you intend to cite from this work. See Guidance on citing. To link to this item DOI: 10.1039/D5CP00692A Abstract/SummaryToken generation in generative pretrained transformers (GPTs) that produce text, code, or molecules often use conventional approaches such as greedy decoding, temperature-based sampling, or top-k or top-p techniques. This work shows that for a model trained to generate inhibitors of the enzyme HMG-Coenzyme-A reductase, a variable temperature approach using a temperature ramp during the inference process produces larger sets of molecules (screening libraries) than those produced by either greedy decoding or single-temperature-based sampling. These libraries also have lower predicted IC50 values, lower docking scores, and lower synthetic accessibility scores than libraries produced by the other sampling techniques, especially when used with very short prompt-lengths. This work explores several variable-temperature schemes when generating molecules with a GPT and recommends a sigmoidal temperature ramp early in the generation process.
Altmetric Deposit Details University Staff: Request a correction | Centaur Editors: Update this record |