EE368 Project - Spring 00

Handwritten Numeral Recognition
Using Eigenimage Method and Feature-Based Method

Sung-Won Yoon & Seong Taek Chung
Department of Electrical Engineering
Stanford University, Stanford, California 94305
{yoons1, bluesky} @stanford.edu


I. Abstract
II. Database
III. Eigenimage Method
     A. Decision Rule
     B. Preprocessing Method
     C. Effect of Quantization
     D. Effect of Transformation
     E. Effect of Transformation Coefficients Reduction
     F. Effect of Processing after Transformation Coefficients Reduction
     G. Optimal Eigenimage Method
IV.Feature-Based Method
     A. Preprocessing
          1. Filtering
          2. Normalization
          3. Threshold
          4. Thinning
     B. Classification
          1. Single Stage Modules
          2. Preliminary Criteria
               a. Histogram
               b. Horizontal Division
               c. Feature Extraction
          3. Secondary Criteria (Enhancement)
V. Results
VI. Conclusions
VII. References
Appendix

Abstract

We have looked into handwritten numeral recognition using eigenimage and feature-based methods. In eigenimage method our effort was concentrated on the development of efficient decision rule and preprocessing method. In feature-based method, numeral images are normalized and novel features are extracted and used in classifying the numeral images into its according numerals.  Both methods show quite a decent performance, though neural network or multi-layer decision scheme should be combined to achieve the state-of-art performance.

[TOP]


Database

We have found a public domain database constructed by E. Alpaydin, C. Kaynak, Bogazici University, Istanbul, Turkey. Their database had 32 by 32 binary level  numeral, which was already preprocessed too much to our purpose. So we have also constructed our own database, which has 104 by 155 256 gray level numeral. Our database has 350 training data and 150 validation data. After developing eigenimage method and featured-based method, we also have shown the performance using the public domain data. We have used 1932 training numeral and 632 validation data in public domain database.

[TOP]


Eigenimage Method

The target of our eigenimage method is the development of the most efficient preprocessing method of the raw data. What we mean by ``efficient'' is ``better performance with less computation and less memory storage''. First we picked the most efficient decision rule among several candidates and then we also chose the most efficient preprocessing rule after conducting several kinds of experiments. Then we conduct simulation using the decision rule and preprocessing method that we have found to be efficient.
 

Decision Rule

There can be various decision rules. In constructing eigenimages, either correlation matrix or covariance matrix can be used. As a decision metric, the norm of coefficients per each eigenimage or the norm of the difference between validation data coefficients and template data coefficients can be used.
Many literature ([1],[2],[3]) use a covariance matrix for eigenimage construction. But in constructing covariance matrix we also have to estimate the mean of all the images. And when we are dealing with updating of data set, updating covariance can be simplified when correlation is used ([1]) In correlation matrix the optimal energy concentration property is also valid. So we will use correlation matrix in constructing our eigenimages.
As a decision metric two cases were considered. When validation data comes in, we project that data on the ith eigenimage of numeral n (0,1,2,3,4,5,6,7,8,9). This coefficient is called c(i,n). For the average test data per each numeral n, we calculate projection coefficients on ith eigenimage. This coefficient is called a(i,n).
The following two equations summarize these two decision metrics.
 
1. 
2. 
.

By test we checked that 1. has about 11% error while 2. has 16% error when only the most important eigenimage is used in decision. As the number of eigenimages used increase, 1. outperform 2. much more. So we choose 1.
 

Preprocessing Method

The goal here is to look for the most efficient preprocessing method prior to constructing eigenimage. The definition of efficient preprocessing means the preprocessing method which guarantees a good performance and requires less computation and less memory storage. The computation load of eigenimage stems from three part. One comes from the initial generation of eigenimage, which can be eleviated by Kirby and Sorovich [2]. The other comes from the update of eigenimage by increasing training set, which can be eleviated by [1] at the cost of accuracy. The last computation load arise in doing judgment with validation data, whose computation time is much less than the previous ones.
 

Effect of Quantization

First we have tested the effect of quantization. Our raw data is 256 gray level. From information theoretic view point, by making it binary gray level, we lose a lot of information. But we don't know the effect of quantization in doing judgment theoretically. In making binary gray level, our threshold level (0.93/1.00) was decided to reflect the characteristics of numeral clearly while suppressing unnecessary noise. Our results (Figure 1) show that using binary gray level doesn't degrade the judgment performance greatly. So considering less memory storage, we choose the binary gray level in store our training data.
 

Effect of Transformation

Now we also think of the way to incorporate the transformation method in doing this decision. Our idea was that coefficients of certain transform might have some distinct local characteristic per each numeral. We first generated DCT coefficients and have investigated the shape carefully. But there is no finding the specific local characteristic of each numeral though some wide range trend was observed as shown below.
 
Figure 2.  DCT coefficients for numeral 0,1 and 2

To use this general trend we performed the DCT and then we generated eigenimage and did judgment. Just using DCT doesn't change the performance (Figure 3).  To our vision, the coefficients looks meaningless while original image is meaningful. But mathematically as DCT is unitary transform, which is merely the rotation. So as our all judgment process is purely mathematical, DCT doesn't make any difference in performance.
 

Effect of Transformation Coefficients Reduction

We know that DCT concentrates the energy at the upper left corner of coefficient image. So we try to use only upper left corner of the coefficients in generating eigenimage. By trial and error we have found that 32 by 32 coefficients reflect the original image quite well while requiring less memory storage and computation. The result shows that even though there is a slight degradation, if we use large number coefficients we can get almost the same level of performance.
 
Figure 4.  Effect of DCT Reduction on Error Rate

Effect of Processing after Transformation Coefficients Reduction

We have found out that DCT coefficients reduction result (Figure 5) in ringing effect in original image domain. This is expected because we did windowing in the DCT domain. But by thresholding in the time domain, it would be possible that we make our numeral clearer as shown at the image. So we have tested the effect of processing after transformation coefficients reduction. But this (Figure 6) performed worse than the original image while computation and memory storage requirement is more than the original image (because we have to have 104 by 155 coefficients not 32 by 32 because of post processing). Even though our trial (Figure 7) here was not good, it is noteworthy that we can maintain the certain thickness of numeral by DCT and thresholding regardless of the thickness of original numeral.
 

Optimal Eigenimage Method

 
Figure 8.  Optimal Eigenimage Method
[TOP]

Feature-Based Method

In feature-based method, all the numeral images are normalized in size and the features that characterize the numerals are searched within the images ([4], [5]).  These determining features are statistically common features in 'normally' written numerals.  Our feature-based method is comprised of two main parts - the preprocessing and the single-stage numeral classification stage as shown in the figure.
 
Figure 9.  Block Diagram of Feature-Based Method

Preprocessing

Filtering

Since the raw data used in feature-based method are binary images of the numerals after thresholding the 256 gray level scanned image, there are spurious features outside the region of the numeral itself.  These features were mostly few pixels in extent resembling island like structures.  They needed to be eliminated to normalize the numeral images for feature detections.  A simple 3-by-3 median filter was used to blur the isolated 'blobs' and a morphological cleaning ([6]) process was applied to rid majority of the spurious features (Figure 10).  In our experiment, 98% of the images were completely filtered.
 

Normalization

For feature extraction, a normalized image of the numerals was the first step in a robust comparison since the size of the handwritten numerals varied vastly from person to person.  Since the raw images were filtered, region of the numerals has pixel values of 1's and the the rest of the 'empty' regions are filled with only 0's.  Vertical and horizontal histograms were used to close in on the region of the numeral - the bounds of the numeral were utilized.  This region was cut out and resized to 32-by-32 using bilinear method.  Most often the numerals scanned were larger than size of 32-by-32, and the resizing resulted in decimation of the image.  In cases of numerals that were less than 32 pixels in width or length, such as 1's and numerals written very small, there was no resizing of the image.  Rather it was zero-padded appropriately on either side to make the image size of 32-by-32.  If a numeral image was interpolated, it resulted in unnecessary artifacts when the morphological thinning was applied in the later processing.
 

Threshold

Since bilinear method was used in normalizing the images to 32-by-32 size, the new images are transformed back to 256 gray level.  For the feature-based method, binary images suffice in the decision criteria, so the images were changed back to binary image by thresholding.  An empirical value of 0.5 in the scale of 0 to 1 was found to be effective for our experiments.  An important observation is that different threshold levels results in slightly different end results - preprocessing is a crucial factor in feature-based method.
 

Thinning

In extracting the features of each numerals, thickness of the handwritten numerals only hinder the correct detection, so the core parts of the handwriting needed to be extracted.  For such extraction of the core parts of the image, well known and widely used morphological 'thinning' and 'skeletonizing' methods were inspected ([6]).  The 'skeletonizing' method results in unwanted artifacts as the handwritten numerals became thicker, but the 'thinning' method did not have such problems.  So morphological thinning was used to thin out the numerals to single pixel width numerals.  In this process, some discontinuities in the numerals occurred, but a simple algorithm of 'bridging' these discontinuities were used to maintain the continuity in the numerals.
 
 

Classification

Single Stage Modules

In determining a number from the preprocessed numeral image, 10 'independent' modules (module0 to module9) were used.  Each module has several criteria for determining a number, e.g. module0 has all the criteria for the numeral image to be determined as a 0.  In another words, moduleX determines whether the numeral image is the number X or not, and it determines independently from other modules.  A numeral image is input to all of the 10 modules and moduleX determines whether the input image is a number X.  There can be more than one or no module that respond as positive to the input numeral image.  Being single stage, there is no algorithm to resolve the conflict of multiple acknowledgment of the numeral modules.
 

Preliminary Criteria

Histogram

Vertical and horizontal histograms were used to extract features of vertical and horizontal lines.  From inspections of the handwritten numerals, there was a tendency of writing the numerals vertically slanted to the right.  As a results, horizontal histograms were spread out providing less prominent features compared to the vertical histograms.  Using the vertical histograms, horizontal lines of the numerals were easily detected.  Since more salient features of the numerals can be extracted from slices relative distance from these horizontal lines, vertical histograms were crucial in locating the individual features of the numerals.  Both vertical and horizontal histograms were more extensively utilized in finding the 'bounds' of the numerals.  These 'bounds' were similarly utilized to find more salient features of each numerals.  Histograms (Figure 11) were inspected globally on the numeral image.
 
 

Horizontal Division

As mentioned, handwritten numerals were mostly vertically slanted but mostly not slanted in the horizontal direction.  To narrow down on the local features of the numerals, the images were horizontally divided into two at upper half, half, and lower half points.  A hard half division was not very effective due to the fact that people have different handwriting.  For example (Figure 12), a 6 was divided in the upper half so that the lower (larger) half encloses the loop feature of the lower portion of numeral 6.  For a 9, a division was made in the lower half for the same reason.  For each numeral, the divisions were made to maintain the key features and separate those features into two sub images.  This division ensured more locality in feature extraction of the numerals.  Histograms were reapplied within the divided images for further analysis.
 

Feature Extraction

The normalized image can be interpreted as a 32-by-32 matrix.  A slice is defined as a particular row or column of the matrix.  In extracting features to determine a numeral, these slices were inspected.  The slices were relative distance from the 'bounds' found earlier in the stage.  Only the transitions in the slice were inspected.
Number of transitions in the slice and the location of the transitions were utilized in determining the numerals in each module.  The location of the slices, number of transitions, and the location of the transitions inspected were all different according to the module.  In other words, there was no global slice applied in all of the modules, rather these criteria were inspected 'independently' depending on the module.
 
Figure 13.  Examples of Feature Extraction 

All the criteria had to be satisfied to be classified as a certain numeral.  The number of criteria inspected was typically around 6 to 7 per each module.
 

Secondary Criteria (Enhancement)

Recognition of the numerals by mentioned features had limits in discriminating numerals that were more difficult to detect.  Another easily seen feature of each numeral were the 'end-points'.  This takes into account how humans usually write numerals - the writing starts at a point and ends at another (or the same) point.  The points where the writing starts and ends are defined as the 'end-points'.  If these end-points are utilized in determining what the numeral of the image is, the restrictions of the slice information can be made less stringent.  Addition of the end-point features in the recognition of the numeral results in improvement in error rate of detection.  The more prominent improvements (Figure 14) occur in the hard-to-detect numerals.
 
 Figure 15.  Examples of Numeral Image and Its End-points
       

 [TOP]


Results

The resulting error percentage using our own database of 500 images and the public domain database  is shown in the following figures.
 
Figure 16.  Error Percentage Using Own Database

In eigenimage method, performance is generally bad except at numeral 1. In case of numeral 1, the characteristic of numeral is so distinct that decision error seldom happens. But for other numerals, eigenimage method alone is not sufficient to guarantee good performance. Numeral 6 is misjudged as numeral 0 or numeral 8 and vice versa. Numeral 7 is also misjudged by numeral 1 or numeral 9.
In feature-based method, the rejection rate and the incorrect rate showed tradeoff relationship in the process of the experiments improving the algorithm.  Efforts were made to lower the incorrect rate because numeral images that are rejected are more likely to be classified correctly as more features are incorporated and multi-stage algorithms are developed.  On the other hand, incorrectly classified numerals have no chance for improvement as the algorithm develops.  The result shows the incorrect rate being less than 6% in all the numerals considered.  The highest rejection rate in the numeral 7 is due to the fact that there were two major ways a 7 was written by a person, and the classification of 7 could not cover the range of different looking numeral images fully.
 
Figure 17.  Error Percentage Using Public Domain Database

In eigenimage method as shown in our database performance in numeral 8 and numeral 9 is really bad. Numeral 8 is misunderstood as numeral 0 or numeral 9, and numeral 9 is misunderstood as numeral 0 or 7. Though performance is relatively better than feature-based method in numeral 0, 1, 2, 3, 7, still far from the state-of-art performance. 
The error percentage of the feature-based method using the public domain database is the sum of the incorrect and the rejected percentage.  The algorithm breaks down and a higher error percentage is visible.  The reason for this discrepancy in performance is mainly due to the preprocessing of the numeral images.  First, the images in the public domain database was already normalized to 32-by-30.  Second, within that size of the image, the numerals were twice as thick as the numerals acquired from our own database which give rise to unwanted artifacts in the thinning process used in the algorithm.  Third, the numerals in the public domain database do not fully occupy the 32-by-30 region, meaning that detection of features has to be done in a more confined region with fewer pixels being used in the normalized 32-by-32 image.

[TOP]


Conclusions

In eigenimage method, we have shown that coefficient reduction after DCT is the most efficient way to do preprocessing. In feature-based method, the percentage of correct classification depends heavily on the preprocessing of the images and has more potential for improvement.  Results of both methods show different performance numeral by numeral.  For some numerals more elegant approach (e.g. neural network, multi-stage decision scheme) should be combined to enhance the performance.  And combination of eigenimage and feature-based technique could be reasonable future direction since we have seen that performance of each method differs a lot numeral by numeral.

[TOP]


References

[1] Ching Y. Suen et al., Computer Recognition of Unconstrained Handwritten Numerals, Proceedings of the IEEE, Vol. 80, No. 7, July 1992, pp.1162-1180
[2] L. Sirovich and M. Kirby, Low Dimensional Procedure for the Characterization of Human Faces, J. Opt. Soc. Amer. A, Vol. 4, No. 3, March 1987, pp. 519 – 524
[3] M. Turk and A. Pentland, Face Recognition using Eigenfaces, IEEE Conf. Computer Vision Patt. Recogn., June 1991, pp. 586 - 591
[4] P. Theodosis, A. Farhat, Computer Recognition of Handwritten Numerals by Polygonal Approximations, IEEE Transactions on Systems, Man, and Cybernetics, Vol. SMC-5, No. 6, November 1975, pp.610-614
[5] F. Kimura, M. Shridhar, Handwritten Numeral Recognition Based on Multiple Algorithms, Pattern Recognition, Vol. 24, No. 10, 1991, pp.969-983
[6] R. Gonzalez, R. Woods, Digital Image Processing, Addison-Wesley, 1992

[TOP]


Appendix

Though strong collaboration was sought throughout the entire project, specific implementation of eigenimage method was done by Seong Taek Chung and the feature-based method was done by Sung-Won Yoon.

[TOP]