Classification of High Dimensional Signals with Small Training Sample Size with Applications towards Microwave Based Detection Systems
Licentiate thesis, 2013
Classification techniques attempt to resolve the problem of categorizing data into two or more classes. The data distribution is therefore the most critical fact to be aware of. Unfortunately, specifications of data generators are not available in real life and a probabilistic density parameterisation is not always applicable, especially for the situation of High Dimensional data with Low (training) Sample Size (HDLSS). This raises the importance of developing data driven techniques, where the data model is assumed according to partially accessible prior knowledge or cross-validation. There are various popular data assumptions, such as centroid-based models, linear subspace models, manifold data structures, etc, and one should take into consideration the model accuracy, computational complexity, generalization ability, and be aware of possibilities of overfitting. When the dimensionality of the data is much higher than the training sample size, all issues appear as its nature and there is no easy way to find a good trade-off.
In this work, we mainly focus on the first two types of data models and develop corresponding classification techniques. The first objective is to automatically learn the data generating model with limited amount of training samples available. With the assumed data model, the second step is to maximize the class separability with respect to the model assumption. The applications studied encompass both simulated and measured microwave signals for stroke type diagnostics and wood quality assessment. The results are analyzed and compared with more classical approaches.