Navigation auf uzh.ch

Suche

Digital Visual Studies

Embodied Cognition in Virtual Environments with Diachronic Analysis of Linguistic and Visual Inputs - Jason Armitage - Ph.D. Fellow

PhD Project - Embodied Cognition in Virtual Environments with Diachronic Analysis of Linguistic and Visual Inputs

Jason Armitage

Motivation
Historical analysis of cognitive structures has been conducted through the study of spatial relationships in historical texts, images, and maps. Recent research in machine learning presents exciting opportunities to extend this analysis to embodied cognition by placing agents in virtual environments. Consider an agent that must navigate a virtual rendering of a historical location using linguistic instructions and visual cues derived from contemporaneous guidebooks and maps. The agent will be required to process linguistic, visual, and spatial information that are related to the source documents under study. In the performance of actions, the agent will also develop representations and reasoning that correspond to the environment and inputs. In this research, agents will be placed in virtual environments to perform two tasks. First agents will navigate to a destination with the assistance of natural language instructions from texts. On reaching the destination, agents will then conduct place recognition using linguistic and visual cues derived from texts and images. The proposed research will enable diachronic analysis of cognitive structures in relation to locations and entities in the artefacts under investigation.


Methods
In this research, the project will develop and evaluate machine learning methods that apply to embodied vision and language tasks. Constituent models in this framework will employ brain-inspired architectures with a focus on learning continuously across tasks. These architectures will be deployed in systems that learn from linguistic, visual, and geospatial inputs.
Brain Inspired Methods for Continual Learning
In the brain, ​synaptic plasticity ​enables flexibility in learning from inputs. Long-Term Potentiation and Long-Term Depression describe fluctuations in the volume of dendritic spines and are associated with the retention of information over time (1a, 1b). Analogous mechanisms for this biological process in the literature on computational systems for multitask learning include the preservation of parameters between sequential tasks (2, 2b) and meta-learning as a method for discovering rules that are equated to synaptic plasticity (3). Researchers have also taken inspiration from ​neural population dynamics​ to enable continual learning in artificial systems (4). An explosion of new techniques for recording neural populations have enabled the study of dynamics in relation to the presentation of stimuli and the performance of tasks. Corresponding methods propose the modelling of system dynamics to improve the continual learning abilities of computational models. One approach is to introduce VAEs - or a related dimensionality reduction method - to limit the state space (5). Duncker et al propose conducting computations for orthogonal tasks in additional subspaces (6). ​Multiple brain-inspired  methods​ are combined to improve performance over multiple sequential tasks. Signals based on context and a method that mimics dendritic stabilisation to offset catastrophic forgetting are paired in one instance of a composite approach (7). A further step towards architectures that more closely resemble biological are the incorporation of spiking units in recurrent neural networks. Although the design of new processors forms a focal point for research in this area, systems with spiking units have been implemented with the aim of transferring knowledge between tasks (8).
Automated Methods for Performing Linguistic Subtasks
Constituent tasks are performed by learning from textual data in the form of instructions and descriptions. The employment of natural language spans sourced from historical documents requires developing systems that are capable of coping with changes in semantics and syntax over time - and handling rare words. Resolving co-references and handling allocentric relations are capabilities that enable automated systems to perform Vision-Language Navigation tasks (14). Specific challenges identified in textual sources for this project are the use of metaphor in descriptive passages and phrases with static and dynamic spatial relations (15). The focus in tackling these linguistic challenges will be in identifying existing models and algorithms for automated processing of instructions and descriptions.