Automatic Classification of Subjective Qualities in Facial Images Ali Ozer Ercan, Mark Kalman, Harvey Thornburg Applications such as web-based dating and matrimonial services continue to grow in popularity. In using these services, participants submit their photographs so that the web community can vote on their attractiveness. A key logistical problem is that users must manually search a large corpus of images and make subjective evaluations of them. We propose a system that will automatically rate the attractiveness of a facial image by learning the user's preferences on a set of training images. Our project proceeds in several steps. First we must preprocess images to make them homogeneous in size and orientation. Faces must be scaled and centered, and the background must be set to some fixed constant value and be distinguishable from the "face" part of the image. For each preprocessed image, we ask human subjects to rate "attractiveness" (of the whole face, then possibly of individual features such as eyes, lips, nose, etc.) on a scale of 1-10. Then we randomly divide the set of images into a training and test set. On the training set we first apply a PCA to reduce the dimensionality of the feature space. Next, for each class of subjective attribute (e.g. overall attractiveness, attractiveness of eyes, etc.) we estimate a pdf and prior probabilities. Then, to classify a new image, we project the new image onto the feature space and note which region it falls in. Classification regions are chosen according to what minimizes expected "cost", given the associated pdfs, priors, and a cost function. The cost function is chosen to penalize different kinds of classification errors, e.g. mistaking attractiveness of "10" when the true attractiveness is "1" is certainly a more costly mistake than to mistake a "5" for a "6". We will use the testing set of images to test the performance of our system. We obtain classification of each image in the testing set as described above and then compare with the human evaluations. We compute the performance of our system in terms of our classification error cost function. If our classification regions are linear, we can use the principal vector for the separating hyperplane to determine the most important feature for classifying attractiveness. This vector is itself an image, so it should be interesting to interpret. We can project the training data onto this hyperplane and repeat the classification to find also "surrogate" features, in decreasing order of importance.