Stochastic Modeling for Video Object Tracking and Online Learning: manifolds and particle filters
Classical visual object tracking techniques provide effective methods when parameters of the underlying process lie in a vector space. However, various parameter spaces commonly occurring in visual tracking violate this assumption. This thesis is an attempt to investigate robust visual object tracking and online learning methods for parameter spaces having vector or manifold structures. For vector spaces, two different methods are proposed for video tracking. The first builds upon anisotropic mean-shift tracker for appearance similarity and SIR particle filter for tracking of the bounding box. The anisotropic mean shift is derived for a partitioned rectangular bounding box and several partition prototypes with adaptive learning strategy of reference object distributions. The joint scheme maintains the merits of both methods, using a small number of particles (<20) and stabilizes trajectories of target during partial occlusions and background clutter. The second object-tracking algorithm uses joint point feature correspondences and object appearance similarity and an optimal selection criterion for the final tracking. The point feature-based tracking simultaneously exploits and dynamically maintains two separate sets of point feature correspondences in the foreground and surrounding background, where background features are used for the indication of occlusions. The appearance-based tracking uses an enhanced anisotropic mean shift with a fully tunable (5 degrees of freedom) bounding box. The enhancement is achieved by partially guiding it from feature point tracker and dynamically updating the reference object models. It is shown that proposed tracker has more tracking robustness with reduced tracking drift. The contribution related to manifold tracking and online learning focuses on symmetric manifolds (set of covariance matrices) and Grassmann manifolds (set of subspaces). The online appearance learning is based on Bayesian estimation of state variables (related to object appearances) on these manifolds by a dual dynamic model. This model is realized through the help of two mapping (exponential and logarithmic) functions between tangent planes and manifolds. The tracking part is based on Bayesian estimation of state variables (related to affine object bounding box) with manifold appearance embedded. Tracking and online learning is performed in an alternative fashion to mitigate the tracking drift. Moreover, for symmetric manifolds, Gabor features in different frequencies and orientations are introduced for the covariance descriptor. This is effective for both visual and infrared video objects. Further, the spatial information is incorporated in the covariance descriptor by extracting features in partitioned bounding box. For Grassmann manifolds, a novel method is introduced to detect the partial occlusion. The appearance subspace is updated in each time interval if there is an indication of stable and reliable tracking without background interferences. It is further shown that the proposed manifold framework is better by comparisons with existing trackers.
anisotropic mean shift
online learning of reference object
consensus point feature correspondences
Visual object tracking
VG, Sven Hultins gata 6, Chalmers University of Technology
Opponent: Prof. Hamid Aghajan, Director, AIR (Ambient Intelligence Research) Lab, Wireless Sensor Networks Lab Department of Electrical Engineering, Stanford University, USA