Machine-Printed and Handwritten Ethiopic Script Recognition
Doktorsavhandling, 2009

A written language is represented by using machine-printed or handwritten symbols called characters. For automatic recognition of written languages, handwritten script can be captured offline (by a scanner) and online (by electronic digital devices), whereas machine-printed text is captured offline. In line with the method used to capture texts, automatic recognition can be made offline (after writing is completed) or online (at the time of writing). In this thesis, recognition systems for machine-printed and handwritten Ethiopic script are presented. While the main focus of this work is on offline recognition of machine-printed and handwritten Ethiopic script, it is also extended to online recognition of handwritten characters. The offline recognition system presented in this work treats recognition of machine-printed and handwritten characters, and handwritten Amharic words. Lexicons and hidden Markov models (HMMs) are used for recognition of Amharic words in unconstrained handwritten text. In both machine-printed and handwritten script recognition systems, a similar set of features called primitive structural features and their spatial relationships are suggested as basic units of recognition. The idea behind these features is to represent graphically complex characters by less complex primitive structures and their spatial interrelationships. The advantage is that these features are easier to extract and process for recognition than complex-shaped characters. The features are extracted by making use of the 2D direction field tensor for offline recognition of machine-printed and handwritten script. The resultant direction field image is also used for text line detection, character segmentation, and word segmentation. For online recognition, a 2D direction field tensor parameterized by time is used to extract the features. The thesis also presents datasets for testing the performance of recognition systems. The datasets, referred to as EDIDB for the machine-printed and DEHR for the handwriting, were collected from real-life sources and various writers. These datasets are made available to the public to facilitate progress in recognition. Test results reported in the thesis are based on the mentioned datasets unless specified otherwise.

HMM

Handwriting Recognition

OCR

Direction Field Tensor

Ethiopic Character Recognition

Amharic Word Recognition

Structure Tensor

Wigforssalen in house Visionen at Halmstad University, Halmstad
Opponent: Dr. Gabriella Sanniti di Baja

Författare

Yaregal Assabie

Chalmers, Signaler och system

Ämneskategorier

Systemvetenskap

Elektroteknik och elektronik

Signalbehandling

Datorseende och robotik (autonoma system)

ISBN

978-91-7385-293-7

Doktorsavhandlingar vid Chalmers tekniska högskola. Ny serie: 2974

Wigforssalen in house Visionen at Halmstad University, Halmstad

Opponent: Dr. Gabriella Sanniti di Baja

Mer information

Skapat

2017-10-07