Princeton University Library Catalog

Towards First-Person Context Awareness: Discovery of User Routine from Egocentric Video using Topic Models and Structure from Motion

Author/​Artist:
Jabri, Allan [Browse]
Format:
Senior thesis
Language:
English
Advisor(s):
Xiao, Jianxiong [Browse]
Department:
Princeton University. Department of Computer Science [Browse]
Class year:
2015
Description:
62 pages
Summary note:
One of the ultimate goals of our pursuit of AI is to create intelligent machines that help us live our lives. In order to help us, these agents must gather a sense of our context. Already, personal computing technologies like Google Now use ego-centric (first-person) data - email, calendar, and other personal routine information - as actionable context. Recently, wearables have brought us the opportunity to easily capture many types of ego-centric data - including visual data. It is easy to imagine the potential impact of a context-aware intelligent assistant - ’aware’ of not only textual data but immediate visual information - for applications from assisted daily living to annotated augmented reality and self-organized life-logs. We imagine a future world when wearable computing is ubiquitous, and as a result, lifelogs and similar visual data are abundant. The problem of understanding user routine from ’big’ egocentric data naturally extends itself as an important machine learning problem. Our key observation is that egocentric data is ’overfit’ to person wearing it. Because human behavior tends to be periodic, hence the notion of ’routine’, lifelog data must then be a series of manifestations of periodic scenes. Using techniques inspired by work in scene understanding, ubiquitous computing, and 3D scene modeling, we propose two complementary approaches for discovering routine structure in ego-centric image data. We take a scene understanding approach, interpreting routine as periodic visits in meaningful scenes. For a robust representation of routine visual scenes, we propose a formulation of routine visual context as probablistic combinations of scene features discovered from a visual lifelog corpus using topic modeling. Concurrently, we discover the 3D spatial structure of routine scenes by incrementally building SFM models from images of the same spatial context. For proof of concept, we implement our framework using the Google Glass and an infrastructure that we call SUNglass.