A study of relational structure in multi-discrete action spaces in reinforcement learningMoodley, P. (2024) A study of relational structure in multi-discrete action spaces in reinforcement learning. PhD thesis, University of Reading
It is advisable to refer to the publisher's version if you intend to cite from this work. See Guidance on citing. To link to this item DOI: 10.48683/1926.00117645 Abstract/SummaryThis thesis proposed three novel methods for learning and exploiting relational structure in the action space of reinforcement learning (RL) environments. It provides initial evidence across model-free online and offline RL algorithms that mechanisms adapted for extracting and exploiting action structure can mitigate key challenges in multi-discrete domains like sparse rewards and large action spaces. It is demonstrated that the proposed techniques could significantly improve the performance of RL algorithms in multi-discrete action spaces. Firstly, a multi-task approach trains agents across diverse procedural tasks, using state-action visitations to identify task-agnostic action space structure derived from bottlenecks. Count matrices reveal underlying action clusters that are transferred to enhance exploration in new tasks. The approach for extracting and transferring structure is novel compared with other work in multi-task RL and prior information transfer. The proposed approach successfully demonstrates the transfer of task and context agnostic action structure to new tasks and significantly improves convergence over baselines. Secondly, an auxiliary module for proximal policy optimisation (PPO) uses a self-supervised signal from successful state transitions to shape action representations around beneficial relationships. Compared with related work, the relational auxiliary objective is an uncomplicated approach to extracting action structure from multi-discrete action spaces online. The shaped representations demonstrate faster adaptation to complex tasks and better generalisation. Finally, Decision Transformers are adapted through novel multi-token expansions of multi-discrete actions, exposing more mixing opportunities. Comparisons against single token variants reveal consistent gains in the Deadly Corridor scenario within the ViZDoom platform. Further analyses confirm individual actions are actively attended to by the Decision Transformer after multi-tokenisation. In summary, the contributions in this thesis provide good evidence that mechanisms that expose and exploit relational attributes can enhance sample efficiency and generalisation in multi-discrete action spaces. Multi-tokenisation and auxiliary modules are two particular methods that show promise for further exploration for leveraging structure. Further work remains in validating and interpreting learned relationships, however this research direction appears fruitful.
Download Statistics DownloadsDownloads per month over past year Altmetric Deposit Details University Staff: Request a correction | Centaur Editors: Update this record |