Real-time face analysis for gender recognition on video sequences
Abstract
This research work has been produced with the aim of performing gender recognition in real-time on face images extracted from real video sequences. The task may appear easy for a human, but it is not so simple for a computer vision algorithm. Even on still images, the gender recognition classifiers have to deal with challenging problems mainly due to the possible face variations, in terms of age, ethnicity, pose, scale, occlusions and so on.
Additional challenges have to be taken into account when the face analysis is performed on images acquired in real scenarios with traditional surveillance cameras. Indeed, the people are unaware of the presence of the camera and their sudden movements, together with the low quality of the images, further stress the noise on the faces, which are affected by motion blur, different orientations and various scales. Moreover, the need of providing a single classification of a person (and not for each face image) in real-time imposes to design a fast gender recognition algorithm, able to track a person in different frames and to give the information about the gender quickly.
The real-time constraint acquires even more relevance considering that one of the goals of this research work is to design an algorithm suitable for an embedded vision architecture.
Finally, the task becomes even more challenging since there are not standard benchmarks and protocols for the evaluation of gender recognition algorithms.
In this thesis the attention has been firstly concentrated on the analysis of still images, in order to understand which are the most effective features for gender recognition. To this aim, a face alignment algorithm has been applied to the face images so as to normalize the pose and optimize the performance of the subsequent processing steps. Then two methods have been proposed for gender recognition on still images.
First, a multi-expert which combines the decisions of classifiers fed with handcrafted features has been evaluated. The pixel intensity values of face images, namely the raw features, the LBP histograms and the HOG features have been used to train three experts which takes their decision by taking into account, respectively, the information about color, texture and shape of a human face. The decisions of the single linear SVMs have been combined with a weighted voting rule, which demonstrated to be the most effective for the problem at hand.
Second, a SVM classifier with a chi-squared kernel based on trainable COSFIRE filters has been fused with an expert which rely on SURF features extracted in correspondence of certain facial landmarks. The complementarity of the two experts has been demonstrated and the decisions have been combined with a stacked classification scheme.
An experimental evaluation of all the methods has been carried out on the GENDER-FERET and the LFW datasets with a standard protocol, so allowing the possibility to perform a fair comparison of the results. Such evaluation proved that the couple COSFIRE-SURF is the one which achieves the best accuracy in all the cases (accuracy of 94.7% on GENDER-FERET and 99.4% on LFW), even compared with other state of the art methods. Anyway, the performance achieved by the multi-expert which rely on the fusion of RAW, LBP and HOG classifiers can also be considered very satisfying (accuracy of 93.0% on GENDER-FERET and 98.4% on LFW)...[edited by Author]