Where Am I? Comparing CNN and LSTM for Location Classification in Egocentric Videos

Authors: Georgios Kapidis, Ronald Poppe, Elsbeth van Dam, Lucas Noldus, Remco Veltkamp

Abstract

Egocentric vision is a technology that exists in a variety of fields such as life-logging, sports recording and robot navigation. Plenty of research work focuses on location detection and activity recognition, with applications in the area of Ambient Assisted Living. The basis of this work is the idea that locations can be characterized by the presence of specific objects. Our objective is the recognition of locations in egocentric videos that mainly consist of indoor house scenes. We perform an extensive comparison between CNN and LSTM based classification methods that aim at finding the in-house location by classifying the detected objects which are extracted with a state-of-the-art object detector. We show that location classification is affected by the quality of the detected objects, i.e. the false detections among the correct ones in a series of frames, but this effect can be greatly limited by taking into account the temporal structure of the information, by using LSTM. Finally, we argue about the potential for useful real-world applications.

In Proceedings of: SmarterAAL'18 Workshop,  IEEE International Conference on Pervasive Computing and Communication (PerCom)