Time-Dependent Bag of Words on Manifolds for Geodesic-Based Classification of Video Activities towards Assisted Living and Healthcare
Journal article, 2017
In this paper, we address the problem of classifying activities of daily living (ADL) in video. The basic idea of the proposed method is to treat each human activity in the video as a temporal sequence of points on a Riemannian manifold, and classify such time series with a geodesic-based kernel. The main novelties of this paper are summarized as follows: (a) for each frame of a video, low-level features of body pose and human-object interaction are unified by a covariance matrix, i.e., a manifold point in the space of symmetric positive definite (SPD) matrices Sym_+^d; (b) a timedependent bag-of-words (BoW+T) model is built, where its codebook is generated by clustering per frame covariance matrices on Sym_+^d; (c) for each video, high-level BoW+T features are extracted from its corresponding sequence of per-frame covariance matrices; (d) for activity classification, a positive definite kernel is
formulated, taking into account the underlying geometry of our BoW+T features, i.e., the unit $n$-sphere. Experiments were conducted on 2 video datasets. The first dataset contains 8 activity classes with a total of 943 videos, and the second one contains 7 activity classes with a total of 224 videos. The proposed method achieved high accuracy (average 89.66%) and small false alarms (average 1.43%) on the first dataset. Comparison with 6 existing methods on the second dataset showed further evidence on the effectiveness of the proposed method.
time-dependent bag-of-words (BoW+T) model
Activity of daily living (ADL)