Understanding 3D vision as a policy network

Glennerster, Andrew

Download

Preview

Text (Open Access)
- Published Version
· Available under License Creative Commons Attribution.

[thumbnail of Understanding3Dvision_30jul.pdf]

Text
- Accepted Version
· Restricted to Repository staff only

Advice

Please see our End User Agreement.

It is advisable to refer to the publisher's version if you intend to cite from this work. See Guidance on citing.

Tools

Lists

Glennerster, A. ORCID: https://orcid.org/0000-0002-8674-2763 (2023) Understanding 3D vision as a policy network. Philosophical Transactions of the Royal Society B-Biological Sciences, 378 (1869). ISSN 1471-2970 doi: 10.1098/rstb.2021.0448

Abstract/Summary

It is often assumed that the brain builds 3D coordinate frames, in retinal coordinates (with binocular disparity giving the 3rd dimension), head-centred, body-centred and world-centred coordinates. This paper questions that assumption and begins to sketch an alternative based on, essentially, a set of reflexes. A 'policy network' is a term used in reinforcement learning to describe the set of actions that are generated by an agent depending on its current state. This is an untypical starting point for describing 3D vision, but a policy network can serve as a useful representation both for the 3D layout of a scene and the location of the observer within it. It avoids 3D reconstruction of the type used in computer vision but is similar to recent representations for navigation generated through reinforcement learning. A policy network for saccades (pure rotations of the camera/eye) is a logical starting point for understanding (i) an ego-centric representation of space (e.g. Marr's (1982) 2.5-D sketch) and (ii) a hierarchical, compositional representation for navigation. The potential neural implementation of policy networks is straightforward; a network with a large range of sensory and task-related inputs such as the cerebellum would be capable of implementing this input/output function. This is not the case for 3D coordinate transformations in the brain: no neurally implementable proposals have yet been put forward that could carry out a transformation of a visual scene from retinal to world-based coordinates. Hence, if the representation underlying 3D vision can be described as a policy network (in which the actions are either saccades or head translations), this would be a significant step towards a neurally plausible model of 3D vision.

Altmetric Badge

Dimensions Badge

Item Type	Article
URI	https://centaur.reading.ac.uk/id/eprint/106600
Identification Number/DOI	10.1098/rstb.2021.0448
Refereed	Yes
Divisions	Interdisciplinary Research Centres (IDRCs) > Centre for Integrative Neuroscience and Neurodynamics (CINN) Life Sciences > School of Psychology and Clinical Language Sciences > Department of Psychology Life Sciences > School of Psychology and Clinical Language Sciences > Neuroscience Life Sciences > School of Psychology and Clinical Language Sciences > Perception and Action
Publisher	The Royal Society
Download/View statistics	View download statistics for this item

Download Statistics

Downloads

Downloads per month over past year

Related URLs

Deposit Details

CORE (COnnecting REpositories)

University Staff: Request a correction | Centaur Editors: Update this record

Date Deposited:	22 Aug 2022 11:47	Date item deposited into CentAUR
Last Modified:	09 Jun 2025 18:33	Date item last modified