EE368 Project - Spring 00
Handwritten Numeral Recognition
Using Eigenimage Method and Feature-Based Method
Sung-Won Yoon & Seong Taek Chung
Department of Electrical Engineering
Stanford University, Stanford, California 94305
{yoons1, bluesky} @stanford.edu
I. Abstract
II. Database
III. Eigenimage Method
A. Decision Rule
B. Preprocessing
Method
C. Effect
of Quantization
D. Effect
of Transformation
E. Effect
of Transformation Coefficients Reduction
F. Effect
of Processing after Transformation Coefficients Reduction
G. Optimal
Eigenimage Method
IV.Feature-Based Method
A. Preprocessing
1. Filtering
2. Normalization
3. Threshold
4. Thinning
B. Classification
1. Single
Stage Modules
2. Preliminary
Criteria
a. Histogram
b. Horizontal Division
c. Feature Extraction
3. Secondary
Criteria (Enhancement)
V. Results
VI. Conclusions
VII. References
Appendix
Abstract
We have looked into handwritten numeral recognition using eigenimage and
feature-based methods. In eigenimage method our effort was concentrated
on the development of efficient decision rule and preprocessing method.
In feature-based method, numeral images are normalized and novel features
are extracted and used in classifying the numeral images into its according
numerals. Both methods show quite a decent performance, though neural
network or multi-layer decision scheme should be combined to achieve the
state-of-art performance.
[TOP]
Database
We have found a public domain database constructed by E. Alpaydin, C. Kaynak,
Bogazici University, Istanbul, Turkey. Their database had 32 by 32 binary
level numeral, which was already preprocessed too much to our purpose.
So we have also constructed our own database, which has 104 by 155 256
gray level numeral. Our database has 350 training data and 150 validation
data. After developing eigenimage method and featured-based method, we
also have shown the performance using the public domain data. We have used
1932 training numeral and 632 validation data in public domain database.
[TOP]
Eigenimage Method
The target of our eigenimage method is the development of the most efficient
preprocessing method of the raw data. What we mean by ``efficient'' is
``better performance with less computation and less memory storage''. First
we picked the most efficient decision rule among several candidates and
then we also chose the most efficient preprocessing rule after conducting
several kinds of experiments. Then we conduct simulation using the decision
rule and preprocessing method that we have found to be efficient.
Decision Rule
There can be various decision rules. In constructing eigenimages, either
correlation matrix or covariance matrix can be used. As a decision metric,
the norm of coefficients per each eigenimage or the norm of the difference
between validation data coefficients and template data coefficients can
be used.
Many literature ([1],[2],[3]) use a covariance matrix for eigenimage
construction. But in constructing covariance matrix we also have to estimate
the mean of all the images. And when we are dealing with updating of data
set, updating covariance can be simplified when correlation is used ([1])
In correlation matrix the optimal energy concentration property is also
valid. So we will use correlation matrix in constructing our eigenimages.
As a decision metric two cases were considered. When validation data
comes in, we project that data on the ith eigenimage of numeral n (0,1,2,3,4,5,6,7,8,9).
This coefficient is called c(i,n). For the average test data per each numeral
n, we calculate projection coefficients on ith eigenimage. This coefficient
is called a(i,n).
The following two equations summarize these two decision metrics.
1.  |
2.  |
.
By test we checked that 1. has about 11% error while 2. has 16% error
when only the most important eigenimage is used in decision. As the number
of eigenimages used increase, 1. outperform 2. much more. So we choose
1.
Preprocessing Method
The goal here is to look for the most efficient preprocessing method prior
to constructing eigenimage. The definition of efficient preprocessing means
the preprocessing method which guarantees a good performance and requires
less computation and less memory storage. The computation load of eigenimage
stems from three part. One comes from the initial generation of eigenimage,
which can be eleviated by Kirby and Sorovich [2]. The other comes from
the update of eigenimage by increasing training set, which can be eleviated
by [1] at the cost of accuracy. The last computation load arise in doing
judgment with validation data, whose computation time is much less than
the previous ones.
Effect of Quantization
First we have tested the effect of quantization. Our raw data is 256 gray
level. From information theoretic view point, by making it binary gray
level, we lose a lot of information. But we don't know the effect of quantization
in doing judgment theoretically. In making binary gray level, our threshold
level (0.93/1.00) was decided to reflect the characteristics of numeral
clearly while suppressing unnecessary noise. Our results (Figure
1) show that using binary gray level doesn't degrade the judgment performance
greatly. So considering less memory storage, we choose the binary gray
level in store our training data.
Effect of Transformation
Now we also think of the way to incorporate the transformation method in
doing this decision. Our idea was that coefficients of certain transform
might have some distinct local characteristic per each numeral. We first
generated DCT coefficients and have investigated the shape carefully. But
there is no finding the specific local characteristic of each numeral though
some wide range trend was observed as shown below.
|
|
Figure 2. DCT coefficients for numeral 0,1 and 2
|
To use this general trend we performed the DCT and then we generated
eigenimage and did judgment. Just using DCT doesn't change the performance
(Figure 3). To our vision, the coefficients
looks meaningless while original image is meaningful. But mathematically
as DCT is unitary transform, which is merely the rotation. So as our all
judgment process is purely mathematical, DCT doesn't make any difference
in performance.
Effect of
Transformation Coefficients Reduction
We know that DCT concentrates the energy at the upper left corner of coefficient
image. So we try to use only upper left corner of the coefficients in generating
eigenimage. By trial and error we have found that 32 by 32 coefficients
reflect the original image quite well while requiring less memory storage
and computation. The result shows that even though there is a slight degradation,
if we use large number coefficients we can get almost the same level of
performance.
|
|
Figure 4. Effect of DCT Reduction on Error Rate
|
Effect
of Processing after Transformation Coefficients Reduction
We have found out that DCT coefficients reduction result (Figure
5) in ringing effect in original image domain. This is expected because
we did windowing in the DCT domain. But by thresholding in the time domain,
it would be possible that we make our numeral clearer as shown at the image.
So we have tested the effect of processing after transformation coefficients
reduction. But this (Figure 6) performed worse
than the original image while computation and memory storage requirement
is more than the original image (because we have to have 104 by 155 coefficients
not 32 by 32 because of post processing). Even though our trial (Figure
7) here was not good, it is noteworthy that we can maintain the certain
thickness of numeral by DCT and thresholding regardless of the thickness
of original numeral.
Optimal Eigenimage Method
|
|
Figure 8. Optimal Eigenimage Method
|
[TOP]
Feature-Based Method
In feature-based method, all the numeral images are normalized in size
and the features that characterize the numerals are searched within the
images ([4], [5]). These determining features are statistically common
features in 'normally' written numerals. Our feature-based method
is comprised of two main parts - the preprocessing and the single-stage
numeral classification stage as shown in the figure.
|
|
Figure 9. Block Diagram of Feature-Based Method
|
Preprocessing
Filtering
Since the raw data used in feature-based method are binary images of the
numerals after thresholding the 256 gray level scanned image, there are
spurious features outside the region of the numeral itself. These
features were mostly few pixels in extent resembling island like structures.
They needed to be eliminated to normalize the numeral images for feature
detections. A simple 3-by-3 median filter was used to blur the isolated
'blobs' and a morphological cleaning ([6]) process was applied to
rid majority of the spurious features (Figure
10). In our experiment, 98% of the images were completely filtered.
Normalization
For feature extraction, a normalized image of the numerals was the first
step in a robust comparison since the size of the handwritten numerals
varied vastly from person to person. Since the raw images were filtered,
region of the numerals has pixel values of 1's and the the rest of the
'empty' regions are filled with only 0's. Vertical and horizontal
histograms were used to close in on the region of the numeral - the bounds
of the numeral were utilized. This region was cut out and resized
to 32-by-32 using bilinear method. Most often the numerals scanned
were larger than size of 32-by-32, and the resizing resulted in decimation
of the image. In cases of numerals that were less than 32 pixels
in width or length, such as 1's and numerals written very small, there
was no resizing of the image. Rather it was zero-padded appropriately
on either side to make the image size of 32-by-32. If a numeral image
was interpolated, it resulted in unnecessary artifacts when the morphological
thinning was applied in the later processing.
Threshold
Since bilinear method was used in normalizing the images to 32-by-32 size,
the new images are transformed back to 256 gray level. For the feature-based
method, binary images suffice in the decision criteria, so the images were
changed back to binary image by thresholding. An empirical value
of 0.5 in the scale of 0 to 1 was found to be effective for our experiments.
An important observation is that different threshold levels results in
slightly different end results - preprocessing is a crucial factor in feature-based
method.
Thinning
In extracting the features of each numerals, thickness of the handwritten
numerals only hinder the correct detection, so the core parts of the handwriting
needed to be extracted. For such extraction of the core parts of
the image, well known and widely used morphological 'thinning' and 'skeletonizing'
methods were inspected ([6]). The 'skeletonizing' method results
in unwanted artifacts as the handwritten numerals became thicker, but the
'thinning' method did not have such problems. So morphological thinning
was used to thin out the numerals to single pixel width numerals.
In this process, some discontinuities in the numerals occurred, but a simple
algorithm of 'bridging' these discontinuities were used to maintain the
continuity in the numerals.
Classification
Single Stage Modules
In determining a number from the preprocessed numeral image, 10 'independent'
modules (module0 to module9) were used. Each module
has several criteria for determining a number, e.g. module0 has
all the criteria for the numeral image to be determined as a 0. In
another words, moduleX determines whether the numeral image is the
number X or not, and it determines independently from other modules.
A numeral image is input to all of the 10 modules and moduleX
determines whether the input image is a number X. There can
be more than one or no module that respond as positive to the input numeral
image. Being single stage, there is no algorithm to resolve the conflict
of multiple acknowledgment of the numeral modules.
Preliminary Criteria
Histogram
Vertical and horizontal histograms were used to extract features of vertical
and horizontal lines. From inspections of the handwritten numerals,
there was a tendency of writing the numerals vertically slanted to the
right. As a results, horizontal histograms were spread out providing
less prominent features compared to the vertical histograms. Using
the vertical histograms, horizontal lines of the numerals were easily detected.
Since more salient features of the numerals can be extracted from slices
relative distance from these horizontal lines, vertical histograms were
crucial in locating the individual features of the numerals. Both
vertical and horizontal histograms were more extensively utilized in finding
the 'bounds' of the numerals. These 'bounds' were similarly utilized
to find more salient features of each numerals. Histograms (Figure
11) were inspected globally on the numeral image.
Horizontal Division
As mentioned, handwritten numerals were mostly vertically slanted but mostly
not slanted in the horizontal direction. To narrow down on the local
features of the numerals, the images were horizontally divided into two
at upper half, half, and lower half points. A hard half division
was not very effective due to the fact that people have different handwriting.
For example (Figure 12), a 6 was divided in
the upper half so that the lower (larger) half encloses the loop feature
of the lower portion of numeral 6. For a 9, a division was made in
the lower half for the same reason. For each numeral, the divisions
were made to maintain the key features and separate those features into
two sub images. This division ensured more locality in feature extraction
of the numerals. Histograms were reapplied within the divided images
for further analysis.
Feature Extraction
The normalized image can be interpreted as a 32-by-32 matrix. A slice
is defined as a particular row or column of the matrix. In extracting
features to determine a numeral, these slices were inspected. The
slices were relative distance from the 'bounds' found earlier in the stage.
Only the transitions in the slice were inspected.
Number of transitions in the slice and the location of the transitions
were utilized in determining the numerals in each module. The location
of the slices, number of transitions, and the location of the transitions
inspected were all different according to the module. In other words,
there was no global slice applied in all of the modules, rather these criteria
were inspected 'independently' depending on the module.
|
|
|
Figure 13. Examples of Feature Extraction
|
|
All the criteria had to be satisfied to be classified as a certain numeral.
The number of criteria inspected was typically around 6 to 7 per each module.
Secondary Criteria (Enhancement)
Recognition of the numerals by mentioned features had limits in discriminating
numerals that were more difficult to detect. Another easily seen
feature of each numeral were the 'end-points'. This takes into account
how humans usually write numerals - the writing starts at a point and ends
at another (or the same) point. The points where the writing starts
and ends are defined as the 'end-points'. If these end-points are
utilized in determining what the numeral of the image is, the restrictions
of the slice information can be made less stringent. Addition of
the end-point features in the recognition of the numeral results in improvement
in error rate of detection. The more prominent improvements (Figure
14) occur in the hard-to-detect numerals.
|
|
|
|
|
|
|
|
|
|
|
Figure 15. Examples of Numeral Image and Its End-points
|
|
|
|
|
[TOP]
Results
The resulting error percentage using our own database of 500 images and
the public domain database is shown in the following figures.
|
|
Figure 16. Error Percentage Using Own Database
|
In eigenimage method, performance is generally bad except at numeral
1. In case of numeral 1, the characteristic of numeral is so distinct that
decision error seldom happens. But for other numerals, eigenimage method
alone is not sufficient to guarantee good performance. Numeral 6 is misjudged
as numeral 0 or numeral 8 and vice versa. Numeral 7 is also misjudged by
numeral 1 or numeral 9.
In feature-based method, the rejection rate and the incorrect rate
showed tradeoff relationship in the process of the experiments improving
the algorithm. Efforts were made to lower the incorrect rate because
numeral images that are rejected are more likely to be classified correctly
as more features are incorporated and multi-stage algorithms are developed.
On the other hand, incorrectly classified numerals have no chance for improvement
as the algorithm develops. The result shows the incorrect rate being
less than 6% in all the numerals considered. The highest rejection
rate in the numeral 7 is due to the fact that there were two major ways
a 7 was written by a person, and the classification of 7 could not cover
the range of different looking numeral images fully.
|
|
Figure 17. Error Percentage Using Public Domain Database
|
In eigenimage method as shown in our database performance in numeral
8 and numeral 9 is really bad. Numeral 8 is misunderstood as numeral 0
or numeral 9, and numeral 9 is misunderstood as numeral 0 or 7. Though
performance is relatively better than feature-based method in numeral 0,
1, 2, 3, 7, still far from the state-of-art performance.
The error percentage of the feature-based method using the public domain
database is the sum of the incorrect and the rejected percentage.
The algorithm breaks down and a higher error percentage is visible.
The reason for this discrepancy in performance is mainly due to the preprocessing
of the numeral images. First, the images in the public domain database
was already normalized to 32-by-30. Second, within that size of the
image, the numerals were twice as thick as the numerals acquired from our
own database which give rise to unwanted artifacts in the thinning process
used in the algorithm. Third, the numerals in the public domain database
do not fully occupy the 32-by-30 region, meaning that detection of features
has to be done in a more confined region with fewer pixels being used in
the normalized 32-by-32 image.
[TOP]
Conclusions
In eigenimage method, we have shown that coefficient reduction after DCT
is the most efficient way to do preprocessing. In feature-based method,
the percentage of correct classification depends heavily on the preprocessing
of the images and has more potential for improvement. Results of
both methods show different performance numeral by numeral. For some
numerals more elegant approach (e.g. neural network, multi-stage decision
scheme) should be combined to enhance the performance. And combination
of eigenimage and feature-based technique could be reasonable future direction
since we have seen that performance of each method differs a lot numeral
by numeral.
[TOP]
References
[1] Ching Y. Suen et al., Computer Recognition of Unconstrained Handwritten
Numerals, Proceedings of the IEEE, Vol. 80, No. 7, July 1992, pp.1162-1180
[2] L. Sirovich and M. Kirby, Low Dimensional Procedure for the Characterization
of Human Faces, J. Opt. Soc. Amer. A, Vol. 4, No. 3, March 1987,
pp. 519 – 524
[3] M. Turk and A. Pentland, Face Recognition using Eigenfaces, IEEE
Conf. Computer Vision Patt. Recogn., June 1991, pp. 586 - 591
[4] P. Theodosis, A. Farhat, Computer Recognition of Handwritten Numerals
by Polygonal Approximations, IEEE Transactions on Systems, Man, and
Cybernetics, Vol. SMC-5, No. 6, November 1975, pp.610-614
[5] F. Kimura, M. Shridhar, Handwritten Numeral Recognition Based on
Multiple Algorithms, Pattern Recognition, Vol. 24, No. 10, 1991,
pp.969-983
[6] R. Gonzalez, R. Woods, Digital Image Processing, Addison-Wesley,
1992
[TOP]
Appendix
Though strong collaboration was sought throughout the entire project, specific
implementation of eigenimage method was done by Seong Taek Chung and the
feature-based method was done by Sung-Won Yoon.
[TOP]