With the ongoing reduction in size and cost of computing and optical
Particularly notable is the U.S. Department of Transportation's
Intelligent Vehicle Initiative .
Driver drowsiness is one specific form of human error that has been well studied.
Consequently, we attempted to devise a camera + image processor system to take
After trying to use cameras in the visible range of the spectrum to detect
Below are some examples of using an "ordinary" digital camera during
day and nighttime driving conditions. As you can see, not enough light is
captured by the particular camera we were using to handle both
situations. We felt like we needed an imaging system that could handle
both daytime and nighttime conditions. So, using the literature as an
example, we chose an IR camera. The daytime images are especially encouraging, since only the eyes seem to
show up in the picture at all. Neural networks are programming abstractions that attempt to make
decisions based on a complex network. By training this network, the
neural network attempts to classify how close an input is to the
space spanned by the training set. Thresholding in this closeness
measure produces a result. Neural networks are notorious for being
difficult to train. Originally, we had an idea to use some code developed for another
class to perform face detection and modify this code to perform eye
detection. Conveniently, MATLAB has a neural network toolbox which
this implementation utilized. It seemed like a good chance to learn
about neural networks and their applications to image processing.
However, as you will see below, we quickly abandoned this idea for
more conventional image processing techniques. Correlation Methods Template matching is one possible technique for searching for a pattern
within an image. Typically, a suitable "template" is chosen as a feature
to search for within in an image. In our case, we chose an "average" eye
by averaging over test cases. An example of an average is shown
below. Template matching is finding the minimum error between "windows" of the
image and the template. Alternatively, this is searching for a maxima of
the convolution of the image with the template. However, to avoid a bias
towards bright sections of the image, each window should be "mean removed"
to insure proper correlation. Hough Transform The Hough transform is transform that searches for maxima in a parametric
space. Thus, any "shapes" that can be expressed parametrically are well
suited to techniques using the hough transform. In our images, the pupils
of the eye formed nearly perfect circles. If we restrict the distance to
the camera, this also roughly fixed the radius of the pupil at 5 pixels. Once we find the maxima in the parametric space, we perform the inverse
hough transform to determine where the original circles are in the image.
Subsequent processing can occur in these regions to further increase SNR. Our implementation of the Hough transform for circular objects is based on
this code developed at
the University of Minnesota by Dan Pou. We attempted to use a matlab neural network
face recognition routine
developed by Scott Sanner for CS223B. Our original idea was
to modify this routine to detect eyes. Though the previously noted literature by
Wierwille
points to neural networks for pattern recognition as a promising approach,
this routine had severe difficulty actually detecting
faces, most likely due to insufficient or improper training. A selected
result is shown for the face detection implementation. It doesn't seem to find a face at all! After playing with this method, we dropped it to develop our own
techniques from methods presented in class.
As the eye closure occurrences dramatically increase during the 10-second
Theory and Methodology
Using IR illumination and IR camera
illuminated with IR lightNeural Network


Results and Discussion
Neural Network
Correlation Methods
Correlation is much harder than it sounds. First, implementing a true correlation function which removes the mean of each window was not done. Performing correlation on the mean removed image and mean removed window is quite tricky. Many spurious maxima are found in the image, including eyebrows, hair, and nostrils.
To combat this, we used many "tricks" to try and zero in on the eyes themselves. In one implementation, we first search for the maxima correlation of the "facemask" to find the general area of eyes and nose. Then we search only in that area for the eyes themselves. This has improved performance compared to a global eye search over the whole image.
Another problem is determining what the "eye" and "face" templates should look like. Different people have different eye shapes and sizes. In addition, they will be closer and further away from the camera, changing the eye's relative size compared to some arbitrary template. Moreover, different illumination conditions will change the response of the pupils to IR light. This makes template matching a hard problem indeed.
![]() | ![]() | ![]() |
The large bright center correctly indicates the location of the center of the eyemask in the test image. |
Although local maxima occur at the eye locations, the global maximum occurs in the lower right section of the image, where the hair and background intermix. |
Below are video implementations on different video frames. A guesstimate puts eye location accuracy at about 75%.
![]() |
![]() |
![]() |
Error checking can be done on video to insure that the proper eyes are being found. Constraints such as "maximum" motion of eyes from frame to frame, last position of eyes, distance between the eyes etc can reduce spurious eye detections. The results are in "nathanprocessed.avi"
Here this correlation implementation seems to work quite well. The places where the correlation does not map are places where the subject has blinked. In fact, a simple blink counter (treating blinks are consecutive frames of unfound eyes) correctly estimates the number of blinks at 3 for this sequence
Hough Transform
Hough
Transform
When attempting circle detection with the
Hough Transform, it is important to remember that it the function is dependant
on a black/white edge map (shown on the bottom left of the animation). We used
an edge detection function with varying thresholds to find the pupils with the
least amount of background noise. In the first example, this thresholding had
to be decreased to such a level that it also captured a lot of background edges
in the hair and ears. Unfortunately, when this edge map is passed into the
Hough detection function, it shows that there are many potential circles of a
radius of five pixels in the image (shown on the bottom right of the
animation). This Hough image is then morphologically thresholded (using the
erode and dissolve functions) such that the most likely circles are
distinguished from the noise (top right of animation) and the corresponding
coordinates are plotted to the original image.
Error Checking
The animation to the left uses the same processes mentioned above, except an error-checking filter is also applied. This filter only passes potential eye locations that are the correct distance apart (which we assume to be constant within 10%). Other improvements added have included an angle analysis method that weights potential eye locations on their angle with the horizontal. (This is based on the assumption that the eyes will form an angle typically close to zero.) Although not perfect, this seems to work with over 80% accuracy.
We had a lot of trouble using a neural network. The training of the network seems unable to handle the various conditions inherent to this problem. Template matching had improved results, but a similar (but less severe) training problem occurs. In addition, in the presence of noise, various image features can attract the template better than the actual feature itself, leading to spurious measurements. Similarly with the Hough transform, spurious edges have similar radii to the actual eye itself--causing false measurements. The addition of error checking, such as constraints on head motion, eye to eye distance and head angle, drastically improves the performance of both techniques.
Dan Pou, "Image Processing Homework 5", Hough Circle Transform. http://www.ece.umn.edu/users/dpou/hw1-5.html Unniversity of Minnesota.
M. Yang, D. Kriegman, N. Ahuja, Detecting Faces in Images: A Survey, Department of Computer Science and Beckman Institute Technical Monograph, University of Illinois at Urbana-Champaign, Urbana IL, 61801
H. Rowley, S. Baluja, and T. Kanade, "Neural Network-Based Face Detection," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 20, No. 1, January, 1998, pp. 23-38.
Sanner, Scott, "CS223B Winter quarter, Final Project" http://www.stanford.edu/~sanner/Vision/Project.html
M. Eriksson and N. Papanikotopoulos. Eye tracking for detection of driver fatigue. In IEEE Conference on Intelligent Transportation Systems, pages 314319, 1997.
M. Funada, S. Ninomija, S. Suzuki, I. Idogawa, Y. Yazu, and H. Ide. On an image processing of eye blinking to monitor awakening levels of human beings. In 18th Annual International Conference of the IEEE Engineering in Medicine and Biology, volume 3, pages 966 967, 1996.
S. Kumakura. Apparatus for estimating the drowsiness level of a vehicle driver. U.S. patent no. 5786765.
Hernandez-Gress, N. Driver drowsiness detection : past, present and prospective work, N. Hernandez-Gress and D. Esteve. Traffic technology international. June/July 1997
Katahara, Shunji. Driver drowsiness detection by eyelids movement from face image, Shunji Katahara, Satoko Nara and Masayoshi Aoki (Seikei University). World Congress on Intelligent Transport Systems (2nd : 1995 : Yokohama-shi, Japan). Steps forward. Vol. 3. Tokyo, Japan : VERTIS, 1995.
Research on vehicle-based driver status/performance monitoring : development, validation, and refinement of algorithms for detection of driver drowsiness, W.W. Wierwille ... et al., Washington, DC, National Highway Traffic Safety Administration, 1994.
Sherman, Peter J. The potential of steering wheel information to detect driver drowsiness and associated lane departure, Peter J. Sherman, Michael Elling, Monty Brekke. Ames, Iowa : Midwest Transportation Center, Iowa State University, 1996.
Taoka, George T. Driver drowsiness and falling asleep at the wheel, George T. Taoka. Transportation quarterly. Vol. 47, no. 4 (Oct. 1993)
Wierwille, Walter W. Development of improved algorithms for on-line detection of driver drowsiness, Walter W. Wierwille, Stephen S. Wreggit, Ronald R. Knipling. International Congress on Transportation Electronics (1994 : Dearborn, Mich.). Leading change. Warrendale, PA : Society of Automotive Engineers, 1994.
Wierwille, Walter W. Evaluation of driver drowsiness by trained raters, Walter W. Wierwille and Lynne A. Ellsworth. Accident analysis and prevention. Vol. 26, no. 5 (Oct. 1994)
Dion wrote the convolution "template matching" code and some error checking code. He also wrote many sections of the report, including the discussion of IR light, the template matching sections, the theory of the hough transform section, and the conclusions with Nathan. He also performed the painstaking task of making sure the links worked (There must be a better way!) He also worked on much of the error checking algorithms along with Nathan. His code includes:
Below are "main" programs to loop through video, possibly including some error checking:Nathan unsuccessfully attempted to adapt the neural network functions to be more responsive to drivers' faces. He started the research on IR cameras by building an IR flashlight and discovered the unique IR signature of a face in daylight. He also wrote the code which detects eyes using the Hough Transform (with the help from some of Dion's error checking code). For the report, he wrote the results and discussion section on the Hough Transform, handled the processing of the avis and created the animated gifs.
ben.mBen performed most of the literature search, built IR light sources that were later replaced by a commercial camera (in the end, we used a Sony Digital HandyCam in its "NightShot" mode, where the camera has an IR light source and some sort of filtering for IR), took digital images and videos and worked (with Dion and Nathan) on code to convert the images to Unix MATLAB- readable format. He wrote the introduction to the report, debugged the HTML code, and edited the entire report for clarity and style.