Machine Learning Methods Using Class-specific Subspace Kernel Representations for Large-Scale Applications
Doctoral thesis, 2016
Kernel techniques became popular due to and along with the rising success of Support Vector Machines (SVM). During the last two decades, the kernel idea itself has been extracted from SVM and is now widely studied as an independent subject. Essentially, kernel methods are nonlinear transformation techniques that take data from an input set to a high (possibly infinite) dimensional vector space, called the Reproducing Kernel Hilbert Space (RKHS), in which linear models can be applied. The original input set could be data from different domains and applications, such as tweets, ratings of movies, images, medical measurements, etc. The two spaces are connected by a Positive-Semi Definite (PSD) kernel function and all computations in the RKHS are evaluated on the low dimensional input set using the kernel function.
Kernel methods are proven to be efficient on various applications. However, the computational complexity of most kernel algorithms typically grows cubically, or at least quadratically, with respect to the training size. This is due to the fact that a Gram kernel matrix needs to be constructed and/or inverted. To improve the scalability for large-scale training, kernel approximation techniques are employed, where the kernel matrix is assumed to have a low-rank structure. Essentially, this is equivalent to assuming a subspace model spanned by a subset of the training data in the RKHS. The task is hence to estimate the subspace with respect to some criteria, such as the reconstruction error, the discriminative power for classification tasks, etc.
Based on these motivations, this thesis focuses on the development of scalable kernel techniques for supervised classification problems. Inspired by the idea of the subspace classifier and kernel clustering models, we have proposed the CLAss-specific Subspace Kernel (CLASK) representation, where class-specific kernel functions are applied and individual subspaces can be constructed accordingly. In this thesis work, an automatic model selection technique is proposed to choose the best multiple kernel functions for each class based on a criterion using the subspace projection distance. Moreover, subset selection and transformation techniques using CLASK are developed to further reduce the model complexity with an enhanced discriminative power for kernel approximation and classification. Furthermore, we have also proposed both a parallel and a sequential framework to tackle large-scale learning problems.