Variable-temperature token sampling in decoder-GPT molecule-generation can produce more robust and potent virtual screening libraries

Tools

Lists

Cafiero, M. ORCID: https://orcid.org/0000-0002-4895-1783 (2025) Variable-temperature token sampling in decoder-GPT molecule-generation can produce more robust and potent virtual screening libraries. Physical Chemistry Chemical Physics. ISSN 1463-9076

Preview	Text (Open Access) - Published Version · Available under License Creative Commons Attribution. · Please see our End User Agreement before downloading. 1MB
	Text - Accepted Version · Restricted to Repository staff only 1MB

It is advisable to refer to the publisher's version if you intend to cite from this work. See Guidance on citing.

To link to this item DOI: 10.1039/D5CP00692A

Abstract/Summary

Token generation in generative pretrained transformers (GPTs) that produce text, code, or molecules often use conventional approaches such as greedy decoding, temperature-based sampling, or top-k or top-p techniques. This work shows that for a model trained to generate inhibitors of the enzyme HMG-Coenzyme-A reductase, a variable temperature approach using a temperature ramp during the inference process produces larger sets of molecules (screening libraries) than those produced by either greedy decoding or single-temperature-based sampling. These libraries also have lower predicted IC50 values, lower docking scores, and lower synthetic accessibility scores than libraries produced by the other sampling techniques, especially when used with very short prompt-lengths. This work explores several variable-temperature schemes when generating molecules with a GPT and recommends a sigmoidal temperature ramp early in the generation process.

Item Type:	Article
Refereed:	Yes
Divisions:	Life Sciences > School of Chemistry, Food and Pharmacy > Department of Chemistry
ID Code:	123336
Publisher:	Royal Society of Chemistry

Download Statistics

Downloads

Downloads per month over past year

Altmetric

Deposit Details

University Staff: Request a correction | Centaur Editors: Update this record

University of Reading

CentAUR: Central Archive at the University of Reading

Accessibility navigation

Variable-temperature token sampling in decoder-GPT molecule-generation can produce more robust and potent virtual screening libraries

Abstract/Summary

Downloads

Page navigation

See also

Footer navigation