Classification error in multiclass discrimination from Markov data
Journal article, 2015
© 2015 Springer Science+Business Media Dordrecht As a model for an on-line classification setting we consider a stochastic process (Formula presented.), the present time-point being denoted by 0, with observables (Formula presented.) from which the pattern (Formula presented.) is to be inferred. So in this classification setting, in addition to the present observation (Formula presented.) a number l of preceding observations may be used for classification, thus taking a possible dependence structure into account as it occurs e.g. in an ongoing classification of handwritten characters. We treat the question how the performance of classifiers is improved by using such additional information. For our analysis, a hidden Markov model is used. Letting (Formula presented.) denote the minimal risk of misclassification using l preceding observations we show that the difference (Formula presented.) decreases exponentially fast as l increases. This suggests that a small l might already lead to a noticeable improvement. To follow this point we look at the use of past observations for kernel classification rules. Our practical findings in simulated hidden Markov models and in the classification of handwritten characters indicate that using (Formula presented.), i.e. just the last preceding observation in addition to (Formula presented.), can lead to a substantial reduction of the risk of misclassification. So, in the presence of stochastic dependencies, we advocate to use (Formula presented.) for finding the pattern (Formula presented.) instead of only (Formula presented.) as one would in the independent situation.
Hidden Markov model