Route selection in non-Euclidean virtual environments

The way people choose routes through unfamiliar environments provides clues about the underlying representation they use. One way to test the nature of observers’ representation is to manipulate the structure of the scene as they move through it and measure which aspects of performance are significantly affected and which are not. We recorded the routes that participants took in virtual mazes to reach previously-viewed targets. The mazes were either physically realizable or impossible (the latter contained ‘wormholes’ that altered the layout of the scene without any visible change at that moment). We found that participants could usually find the shortest route between remembered objects even in physically impossible environments, despite the gross failures in pointing that an earlier study showed are evident in the physically impossible environment. In the physically impossible conditions, the choice made at a junction was influenced to a greater extent by whether that choice had, in the past, led to the discovery of a target (compared to a shortest-distance prediction). In the physically realizable mazes, on the other hand, junction choices were determined more by the shortest distance to the target. This pattern of results is compatible with the idea of a graph-like representation of space that can include information about previous success or failure for traversing each edge and also information about the distance between nodes. Our results suggest that complexity of the maze may dictate which of these is more important in influencing navigational choices.


Introduction
In order to navigate successfully in a 3D environment, human participants have to develop a mental representation of the scene, locate themselves in the representation and plan optimal actions to reach a target. The exact form that such a mental spatial representation might take is still debatable.
One view is that the spatial representation corresponds to a cognitive map [2][3][4], i.e. a stable 3D reconstruction of the environment (whether accurate or not). This provides the most complete description of the environment and can be used for versatile spatial tasks such as planning an optimal route, exploring novel shortcuts or pointing to unseen targets. It could be constructed by means of path integration [5] and fully working implementations of this model are now common in the computer vision and robotics literature based on visual SLAM (Simultaneous Localisation and Mapping) [6] which integrates information from views over multiple vantage points. It has been argued that in small and relatively simple environments such as 'vista spaces' participants have access to a relatively accurate cognitive map within a confined region [7,8] although even in the case of vista spaces there is dispute about whether the underlying representation in this case is Euclidean [9], i.e. corresponds to a rigid 3D reconstruction. However, in larger and more complex environments there is greater agreement that Euclidean reconstruction is a poor model. For instance, the perceived length of a route depends on the number of turns and decision points it contains [10][11][12], angular and directional judgments are highly inaccurate [8,[13][14][15] and perceived angles between junctions are biased towards 90° [11,16]. Hence, while mental representations of small open environments can often appear to be consistent locally, participants typically have difficulties integrating local representations into a single global representation (as has been argued for other primates, too). In particular, performance in large environments is much more likely to be compatible with a distorted 3 or globally inconsistent map [16,17]. This led Kuipers [18] to suggest that the concept of a global 'Map in the Head' should be replaced by an 'Atlas in the Head', with many local maps on separate sheets. Similar ideas of independent reference frames consisting of multiple vista spaces were also proposed in more recent studies [7,8]. It is not clear how these local representations are used by participants when they are confronted by a spatial task (such as pointing) that forces them to integrate information across different local reference frames except that, as Meilinger and colleagues say [7], pointing appears effortful and performance depends on many factors such as the order in which the route was learned. Experimental evidence suggests that performance in this case relies on a representation (or a process of accessing information from a representation) that is not only distorted but also inconsistent with the idea of a single global map [1,[19][20][21].
In an early seminal paper, Siegel and White [22] suggested that, in large-scale environments, spatial representation develops gradually and goes through three main phases: landmark knowledge (salient features), route knowledge (topological connectivity of the space) and survey knowledge (construction of a cognitive map) [23]. Developing this type of idea, Kuipers [18] suggested that, as more information becomes available about an environment, 'topological connections can be strengthened into relative-position vectors' and then, ultimately, a representation uniting multiple frames of reference. He emphasized the co-existence of multiple strategies based on different levels of detail which he described as a cognitive map having 'many states of partial knowledge'. Montello [24] criticized Siegel and White's idea, pointing out that there can be gradual 'quantitative accumulation and refinement of metric knowledge'. Ishikawa and Montello [15] set out to test the developmental progression of representations that Siegel and White and others have advocated and found very little learning across trials (although no feedback was given). They emphasised the fact that some individuals acquired 'surprisingly accurate metric knowledge, even relatively quickly' relating locations between which they had not travelled directly. In line with this finding, when Newcombe and colleagues [25][26][27] tested a large number of participants in virtual reality (VR), they 4 found that there was significant variation in the ability of people to integrate spatial information across routes: participants' pointing performance within a familiar route was not necessarily a good predictor of their ability to point between targets on two different familiar routes.
Warren [28] has drawn together much of the literature on navigation in Euclidean (physically possible) and non-Euclidean environments arguing that the evidence points to humans using a 'labelled graph' (Chrastil and Warren [28], Strickrodt et al [20], Warren et al [19]). This lies between a topological graph and survey knowledge because each edge of the graph can include information about the length of the path connecting those two nodes and there can be information stored about the angle between edges. Warren [29] emphasizes the difference that he sees between a labelled graph and a Euclidean map: "One would expect edge weights and node labels to become more accurate and precise with repeated exposure to an environment, but this does not indicate a qualitative shift from topological to Euclidean knowledge." (p4). In other words, he advocates the view that graph structure and local metric information can be acquired in parallel, as (Ishikawa and Montello [15]) propose. Nevertheless, it is logically possible that a topological graph and a Euclidean map are two opposite extremes of a spectrum and that increased learning and more precise calibration of the information associated with each edge can lead, in the end, to a graph representation that is, for practical purposes, the same as a Euclidean map. In the case of a maze, a topological graph contains no information about length of each corridor or the number of turns or the angle between corridors; a labelled graph contains approximate estimates of these values; and a graph in which all the information about each edge (corridor) is precise and internally consistent is indistinguishable in practice from a Euclidean map. A very similar spectrum has been proposed for the processing of disparity information to guide judgements of ordinal depth, bas relief depth or Euclidean shape [30,31].
There have been many studies that have explored the extent to which participants can encode actions that have led to a successful result in the past and incorporate this in their representation [32][33][34][35]. Marchette et al [34] showed that in a navigational experiment when searching for targets some 5 participants found novel shortcuts easily, while other participants preferred less efficient, but more familiar routes that they had experienced during the learning phase. fMRI analysis showed that participants who preferred shortcuts had a stronger activation in the hippocampal area, while participants who followed the more familiar route had a stronger activation in the caudate which encodes reward. Accurate pointing and reliable identification of novel shortcuts both require a globally unified representation, i.e. more than just following previously rewarded routes.
Interestingly, in the reinforcement learning literature there has been a recent focus on representations that are similar to the 'response-like' model in that they learn what action to carry out at each decision point (given a particular goal) rather than computing a global map [36].
In this paper, we build on our previous study of human pointing errors in a virtual maze [1] which, like the current study, examined the consequences of exploring a physically impossible maze.
The maze had long corridors with many turns in a way that could not be realized in the real world ('wormholes'), similar to the manipulations many other researchers have used to explore spatial behaviours in non-Euclidean environments ( [9,19,37,38]). The conclusion of our previous paper was that the most likely explanation of the data in this type of condition was representation that has no Euclidean interpretation. The current paper examines the performance of the same participants in the same experiment but instead of analysing the pointing responses we report the ability of participants to find the shortest distance through a maze to a target. This task is suited to finding out what information participants use to choose a path when they are at a junction, not to finding out whether they use a Euclidean reconstruction or a graph-like representation. Indeed, if observers have a Euclidean representation that includes the target and their current location, and the task is to choose the shortest route from their representation, then they should do that independent of any past experience of reward. A graph-based representation is more flexible. Initially, observers may only store information about whether or not they have travelled down a particular path and whether this led to the object that is their current goal (similar to 'response-learning', [34,39,40]). Later, they may add 6 information about the distance between nodes. In the current experiment (to anticipate our results), we find that the more complex the maze, i.e. with wormholes, the more likely participants are to choose previously rewarded routes. In the Discussion, we consider how this relates to the idea that people may begin with a topological map of connectivity and gradually add information about reward and distance along corridors once they gain more experience of the environment.

Participants
The 14 participants (5 male and 9 female) who completed the experiment were students or members of the School of Psychology and Clinical Language Sciences. All participants had normal or corrected to normal vision (6/6 Snellen acuity or better), one participant wore glasses during the experiment, and all had good stereo-acuity (TNO stereo test, 60 arcsec or better). All participants were naïve to the purpose of the study. Participants were given a one-hour practice session in VR to familiarize them with our set-up using physically possible mazes. We called physically possible mazes 'Fixed', for short, as they did not change as the participant moved around them. 10 potential participants (in addition to the 14 who took part) either experienced motion sickness during the practice session or could not move confidently in VR and thus preferred not to continue at this stage (any participants who were excluded did so before data was collected for either 'base layout' used in the experiment). Altogether, there were 7 sessions (including the practice), each of about 1 hour, conducted on different days. Participants were advised not to stay in VR longer than 10 minutes between breaks. They received a reward of 12 pounds per hour. The study received approval of the Research Ethics Committee of the University of Reading.

Experimental set-up
The Virtual Reality laboratory was equipped with a Vicon tracking system with 12 infrared cameras (T20 and Bonitas). We used an nVision SX111 head mounted display with a large field of view (111° horizontally with a binocular overlap of 50°). The resolution of the LCD displays was 1280 by 1024 pixels. The headset was calibrated using the method described in [41] in order to minimize optical distortions in the stimuli. The HMD was connected via a 4m-long video cable to a video controller unit on the ceiling. The Vicon tracking system (Tracker 3.1) provided an estimate of the position and orientation of the headset with a nominal accuracy of ±0.1 mm and 0.15° respectively at a frequency of 240Hz and relayed this information to a graphics PC with a GTX 1080 video card.
The stimuli were designed in Unity 3D software [42] and rendered online at 60fps. Participants were allowed to walk freely and explore the virtual environment in a natural way, although they had to hold the HMD video cable behind them and had to take care that the cable did not become tangled as they walked. The experimenter was always close by to ensure that the cable remained behind them. The physical size of the labyrinth was limited to a 3 by 3m region in the lab. The virtual labyrinth was originally a 5 by 5m environment with corridors in the maze 1m wide. In order to fit in the 3 by 3m space, the labyrinth was shrunk to 0.6 scale (e.g. 60cm wide corridors) which meant that the floor was displayed about 1m below eye height. Participants generally found this acceptable and did not notice that the room was not normal size, consistent with previous reports [9]. During the experiment, participants wore a virtual wristband that provided information about the task (shown, for illustrative purposes only, in the bottom-right corner of Fig. 1B). In the pointing phase of the experiment, participants used a hand-held 3D tracked pointing device to point at targets. In VR, the pointing device was rendered as a small sphere (R=5cm) with an infinitely long ray emanating from it in both directions, although the ray could not be seen beyond the corridor walls. Text was displayed on a panel attached to the ray providing instructions (e.g. 'point to Red'). The 6 d.o.f. pose of the cyclopean point (a point midway between the eyes), together with the orientation of the headset was recorded on every frame (60 fps).

Stimuli
We designed two general layouts of the virtual labyrinth (Layout 1, shown in Fig hidden inside open grey boxes, so that they could be seen only from a short distance (Fig. 1B). Other empty grey boxes were added as distractors.
For each labyrinth, we increased the complexity of the environment by extending the length of the corridors with non-metric 'wormholes', see

Fig. 3.
Topological graphs corresponding to the schematics shown in Fig. 2A, B and C (Layout 1). Coloured circles represent targets; S, N1 and N2 are 3-way junctions; S is the start location.

Procedure
Participants followed the instructions they were given, finding the four targets shown on their wristband in the specified order and then, when they reached the fourth target, their task was to point at the other targets and at the Start location. In the course of one experimental session, which took about 1 hour, participants were tested sequentially on the three types of maze, i.e. Fixed, onewormhole and three-wormhole conditions, all with the same general layout (i.e. all Layout 1 or Layout 2). This was a deliberate design that helped participants to navigate in the more complex environments. The tasks and instructions were identical for all three conditions. The instructions given to participants were to collect all four target objects in a specified order in the most efficient way.
'Collect' meant approach sufficiently close to the target (within a radius of 0.5m from the cyclopean point and within the field of view) which caused its colour to change from bright to dull and, at the same time, the colour of that ball changed in the same way on the wrist-mounted panel. The meaning of 'efficient' was not defined precisely for participants although it was emphasized to them that they should not hurry and that their performance was not being judged by their speed. 'Efficient' could mean choosing the shortest path, or the smallest number of turns or junctions (i.e. navigational decisions) -this was left to participants to decide.
After all four targets were collected, the participant was instructed to remain at the location of where S is Start and N1 and N2 are the 3-way junctions shown in Fig. 3. This labelling of the routes participants made was a prerequisite to modelling their navigational decisions, as described in the next section.

Results and modelling
When participants are allowed to move freely through a maze, it can be challenging to aggregate their data in meaningful ways. Our principal solution to this problem was to compare the  Figure 6B and 6C).  In the following section, we consider two models. One takes into account the participant's previous experience and whether one path or another was successful in the sense that it led, ultimately, to the goal that the participant had at the time. If so, this model predicts that the path is more likely to be taken during the test phase. We call this a 'Rewarded-route model'. This approach is somewhat similar to the 'Dual Solution Paradigm' proposed by Marchette et al [34]. Even though in our experiment participants were not restricted in their paths during the learning phase, as they were in Marchette's experiment, it is still possible for us to evaluate the degree of familiarity of the routes that participants took in the test phase. The second model assumes that the participant knows the length of all paths to the goal. We call this the 'shortest distance model'.

Rewarded-route model
The rewarded-route model takes into account all navigational decisions that participants took during the learning phase, and the success or otherwise of the route that they took, and uses this information to predict how they might behave during the three test rounds for that condition. Consider the connectivity matrix for Layout 1 with 1 wormhole shown in Fig. 7B. This shows which routes are possible between any two nodes in the graph (Fig. 7A). The rows represent 'beginning' nodes, i.e. places where the participant has a choice about which way to go. The columns represent 'end' nodes, i.e. where the participant arrives after having made that decision, and a '1' means it is possible to get directly between these two (i.e. there is an edge in the graph between these two nodes). For instance,  In order to predict the choices that participants will make in the test phase, separate decision matrices are required per participant and per goal (because a participant might be expected to make a different choice at a given junction depending on what their goal was: R, G, B or Y). These were generated as follows. Starting with the default likelihood matrix (Fig 7C, ie random choices), the likelihoods associated with each choice were updated in a way that reflected the participant's success whenever they found the target. We re-played all the participants' trajectories during the learning phase. If the participant found the target at the end of a particular route then the next time the participant reached the same junction and had the same goal, the model assumed the participant was more likely to make the same choice again. To explain how this is done in detail, consider an example in which the participant's path goal was R and their path was Start-G-B-N1-R. Since the Red target was found successfully, the decision matrix is updated by increasing the likelihood of all the decisions that made up that path according to the formula below. The update rule has one free parameter, ߙ , that determines the learning rate. Specifically, the likelihood of the steps S-to-G, G-to-B, B-to-N1 and N1to-R (i.e. steps that successfully led to the goal R) are all increased using the following updating rule: 19 ‫‬ ,   8B), which illustrates that, in this example, the participant's behavior during the test phase is consistent with their experience during the learning phase.

Shortest-distance model
The . We assume that estimates of the path length are subject to Gaussian noise whose standard deviation is proportional to path length (Weber's law): is a free parameter. The likelihood of taking the shortest route can be estimated as is the area of the intersection of the two Gaussians and, since there are only two options,

Model comparison
We compare the performance of the two models in predicting the binary choices participants made during the test phase (the last 3 rounds of 8), i.e. at each 3-way junction (we assumed that they did not go backwards at a junction, which was extremely rare in practice    We also sampled from a chance model, i.e. where a model participant would choose options at any junction with equal probability. However, this is a highly unlikely model. The chance model gave rise to negative log likelihoods over 2000 for each condition, way outside the range both of participants' data and of our two models.

Discussion
We have measured the ability of participants to find the shortest route to a previously-viewed target in a virtual labyrinth, especially in cases where the labyrinth has a non-Euclidean structure.
Participants' success in this task contrasted markedly with the drastic failures in pointing to previously-viewed targets that we have described before [1] despite the fact that both measures were obtained from the same participants in the same experimental setup. Our main finding is that participants' route-finding performance in the complex, non-physically-realisable, 'wormhole' conditions was predicted by a rewarded-route model better than a shortest-distance model. In other words, in these wormhole environments participants tended to re-trace the routes that had been successful before when searching for the same target. By contrast, in the simpler, physically-realisable environments participants' route-finding was best predicted by a shortest-distance model. Marchette and colleagues [34] described these as 'response' and 'place' strategies respectively. They found that participants spanned a wide range between the two extremes. We found that the relative dominance of the two different strategies changes depending on the complexity of the scene. Within participant, and tested over the same number of trials, we have found evidence that participants use different strategies or representations depending on the complexity of the scene. Hence, the variation in strategy cannot be due only to individual differences or the number of times an observer experiences an environment [15,[24][25][26][27]. Instead, the length of corridors and the number of twists and turns down each seems to have an important effect on the way people tackle the navigation task. This might also be true in a complex Euclidean environment with many twists and turns.

26
If observers use a graph-like representation, then this change in strategy with different degrees of complexity of the environment is easy to explain. In line with Siegel and White [22] and others [14], our working hypothesis is that observers start with a representation of connectivity and gradually add information about the edges between nodes. This is a flexible notion. The information about edges could be quite crude ('shorter than average edge' versus 'longer distance') but in theory it could include much more precise information up to and including sufficient information about the distance and angles between nodes that the graph representation becomes logically and experimentally indistinguishable from a Euclidean representation. As discussed in the Introduction, a similar argument has been made for the representation of object shape [30,31]. The two types of information that we have explored in this paper, i.e. past success in arriving at the relevant target and distance along an edge, can both be seen as part of this hierarchical progression, adding detail to a stored graph. We have argued that, of these two, information about rewarded routes is more fundamental and is used by participants to guide them in complex, wormhole environments.
A speculation that goes beyond our data, but which is testable, is that the same result would be observable in 'fixed' environments of different degrees of complexity even without introducing non-Euclidean elements in the maze such as wormholes. If it were possible to let participants explore far more complex (but 'fixed', Euclidean) environments and, on other trials, wormhole environments then participants could carry out two tasks simultaneously: (i) search for targets, as in the current experiment, and (ii) judge, in a forced-choice paradigm, whether they believed they were in a complex 'fixed' environment or a 'wormhole' environment. Our prediction is that they would find the second of these tasks quite difficult. We also predict that the rewarded-route model would be the best model of their navigation strategy for both types of environment during the period of learning when they are unable to discriminate between 'Fixed' and non-Euclidean environments. Likewise, a shortest-distance model would be a better model when they became more familiar with the environments, again independent of whether the environment was 'fixed' or not (and also independent of whether the participant judged the environment to be 'fixed' or not). Such an experiment would establish whether the Euclidean structure of the environment was important per se, independent of complexity and familiarity. By contrast, a graph-based model would predict the Euclidean structure per se is not important in predicting performance.
Finally, it is worth comparing the navigation data in the current paper to the pointing data in our previous paper collected in the same environment [1], because, unlike the navigation task, pointing is a direct way of testing whether participants can form a Euclidean representation of the scene. Muryy and Glennerster [1] applied different models to the pointing data and concluded that a Euclidean representation could not account for the pointing responses of participants in the three-wormhole condition as successfully as a non-Euclidean one. The non-Euclidean model in that case allowed both the perceived location and orientation of the observer to vary as they moved around the maze (yellow bars in Figure 11). The conclusion reached was similar to that in the current paper, i.e. that in the three-wormhole environment participants use a cruder form of representation. In more familiar environments (the 'fixed' condition), participants add information to this representation so that, at its most extreme, the information about each edge in the graph is so rich that the representation is equivalent to full Euclidean structure.  [1]. Bayesian Information Criterion is used to compare performance of a metric and a non-metric model of pointing data in the same environment as the current experiment (adapted from Fig 9B in [1]). Unlike the models compared in the current paper, the metric and nonmetric pointing models were nested with different numbers of parameters and hence BIC is an appropriate method of comparison.

Fig. 11. Data reproduced from Muryy and Glennerster
It is logically possible for observers to show excellent performance on the navigation task while making large errors in the pointing task provided one assumes that there is no common, Euclidean representation supporting both tasks. If the visual system relied on a common representation for both tasks, there should be a correlation between the two measures of performance. In each case, we can take measures that indicate how 'lost' a participant is, one from their navigation and one from their pointing. For navigation, we take a ratio of travelled distance to the shortest distance for a full round (including all 4 targets). For participants who are very familiar with the environment, this ratio should be close to one. For pointing, we take the mean absolute pointing error measured for 8 pointing directions (4 targets) at the end of a round as a different measure of how lost they are. In the 'Fixed' condition, there is a significant positive correlation between these two measures, as one might expect (Pearson correlation 0.43, p < 10 -9 ). On the other hand, for both wormhole conditions there is no significant correlation (0.02, p=0.70 and 0.07, p=0.35 for WH1 and WH3 respectively), see Figure S4 in Supplementary Material. This supports the contention that the two measures of 'being lost' are not necessarily linked, something that is compatible with a graph-like representation, but one would not expect this if the observer relied on a Euclidean map for both tasks. There are many examples of such task-dependency in a spatial tasks test: [9,31,[43][44][45]. A recent example is the demonstration by Strickrodt et al [20] that participants can point in quite different directions to the same target depending on how they imagine arriving at it [29]. The authors conclude that local spatial information is not integrated into a coherent global map. The data we have presented here, especially when considered in conjuction with the pointing data from [1], support this view.