Please use this identifier to cite or link to this item: http://hdl.handle.net/10266/4844
Title: Efficient Hidden Markov Models for Online Handwritten Gurmukhi Script Recognition
Authors: Verma, Karun
Sharma, R. K. (Guide)
Keywords: Online handwritten character recognition systems
SVM
HMM
Auxiliary Information
Feature extraction
Writing-zone identification
Classification
Issue Date: 11-Sep-2017
Abstract: Handwriting recognition is the process that converts handwritten characters into machine processable format. The work presented in this thesis deals with online handwritten character recognition. Handwritten characters can either be presented to machine online or offline. A good amount of research in this area has been reported for English, Chinese, Japanese and Korean languages. Research on developing online handwriting recognition systems for some of Indian languages, like, Bangla, Hindi, Malyalam, Tamil and Telugu have been conducted by many researchers in past decade. In this thesis, we have worked for the development of online handwritten character recognition system for Punjabi language. This language is popular among the people living in Punjab, a North Indian state and also among Indians living in other parts of the world such as Australia, Canada, New Zealand and USA. Gurmukhi is the script used to write this language. Some researchers, have also worked on building online handwriting recognition systems for Gurmukhi script. This thesis carries forward the work done by these researchers. An effort has also been made to improve the classification strategies for the recognition of Gurmukhi characters in an online handwritten recognition environment. The work done here is organized in seven chapters. 
 First chapter introduces the characteristics of Gurmukhi script and highlights the need of developing a handwritten character recognition system for this script. This chapter elucidates major issues in online handwritten character recognition and complexities of these issues. The processes involved in online handwritten character recognition have also been explained. A detailed study of related literature has also been carried out and presented in this chapter. This study has been presented separately for the involved processes. Chapter two has focused on the process of data collection, selection of writers, and how data is stored. A fairly large collection of 44,221 samples of 2,048 unique Gurmukhi words containing 2,41,239 strokes have been collected from 167 writers. This chapter also illustrates the identified stroke classes. In this chapter, an outline of common preprocessing techniques of normalization, centering, duplicate point removal, smoothing and resampling has been illustrated.
 In Chapter three, classification of Middle-zone strokes using zone based features with various support vector machine kernels has been presented. This chapter introduces five different zone based features and illustrates how these are extracted from the stroke data. One hundred samples each of eighty two different Middle-zone stroke classes have been used for training of four different support vector machine kernels, namely, linear, polynomial, radial basis function, and sigmoid. Other parameters, including, k in k-fold cross-validation, learning rate (γ) and tolerance limit (ε) have been experimented for classification of strokes using empirical search and grid-search in order to determine their optimal values. feature set yielded the highest recognition rate (93.1%) out of the five features considered in this chapter. Grid search is found to be better than empirical search for selecting kernel hyperparameters since the values of parameters selected using this approach yielded higher recognition accracy. 
 Chapter four presents the study undertaken to train a coherent classifier for recognition of online handwritten Gurmukhi script characters. Seventy two different SVM- and HMM-based classifiers have been trained for recognition of online handwritten Gurmukhi characters. In this chapter five different features, namely, Normalized x-y traces; Region based features; Curvature features; Curvature feature based classes; and Directional features, have been extracted from three normalized window sizes. For each window size and feature-classifier combination, k-fold cross-fold validations for k (= 3, 4, and 5) have also been illustrated in this chapter. Here, forty five SVM-based and twenty seven HMM-based classifiers have been experimented. The feature-classifier combinations have been found to be efficient for 300×300 window size. The top three feature-classifier combinations on 300×300 window size have further been tested on a new dataset containing 35 basic characters of Gurmukhi script. A recognition rate of 96.4% has been achieved with an HMM-based classifier using Curvature based feature classes. A voting-based classification model based on top three feature-classifier combinations has also been implemented in this chapter. This voting-based model achieved a character recognition rate of 96.7%. An average character recognition time of 78.5 millisecond has been recorded for the voting-based classifier. 
 In Chapter five, a study on writing-zone identification is presented. Two different approaches of handwritten character recognition system for Gurmukhi script based on zone identification technique have been presented in this chapter. The dataset consisting of basic characters; characters with vowel modifiers; and characters with subjoined symbols have been considered for experimentation. In the first approach, three different SVM-based classifiers on 99 stroke classes in the three writing-zones, namely, Upper-zone, Middle-zone, and Lower-zone have been trained. Writing-zone of the stroke is identified before preprocessing. Based on the identified zone, stroke is classified using one of the three classifiers. Here, an accuracy of 95.3% has been achieved for zone identification; and an accuracy of 74.8% has been achieved for character recognition. In the second approach, a single HMM-based classifier is trained with 74 stroke classes from all writing zones. A more efficient zone identification algorithm has also been implemented giving a zone identification accuracy of 97.7%. Writing-zone information has been used to generate the final character in postprocessing phase. In this experiment, an accuracy of 88.4% has been achieved for character generation for Gurmukhi script. 
 Chapter six introduces and illustrates a new HMM-based classification technique using auxiliary variable approach. The approach has been motivated by the work in Sampling Theory, wherein, the mean of the main variable on a fairly large population can be estimated more efficiently using a correlated auxiliary variable. This approach has been implemented and validated in this chapter. This approach has resulted in improving the recognition accuracy of HMM-based classifiers. An average improvement of 4.5% in recognition rate has been recorded for HMMs when auxiliary feature is used. 
 Chapter Seven summarizes the work done in the thesis. It also includes the scope for future work. Based on the experiments carried out in this work, we have mentioned few indicators exploring which an even more efficient handwritten character recognition system for Gurmukhi script be developed.
URI: http://hdl.handle.net/10266/4844
Appears in Collections:Doctoral Theses@CSED

Files in This Item:
File Description SizeFormat 
Thesis on 9 Sep 2017.pdf5.26 MBAdobe PDFView/Open    Request a copy


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.