A new 3D face database that includes a rich set of expressions, systematic variation of poses and different types of occlusions is presented in this paper. Hence, this new database can be a very valuable resource for development and evaluation of algorithms on face recognition under adverse conditions and facial expression analysis as well as for facial expression synthesis. Introduction In recent years face recognizers using 3D facial data have gained popularity due to their lighting and viewpoint independence. This has also been enabled by the wider availability of 3D range scanners.
|Published (Last):||28 October 2019|
|PDF File Size:||20.94 Mb|
|ePub File Size:||12.40 Mb|
|Price:||Free* [*Free Regsitration Required]|
Hence, this new database can be a very valuable resource for development and evaluation of algorithms on face recognition under adverse conditions and facial expression analysis as well as for facial expression synthesis. In recent years face recognizers using 3D facial data have gained popularity due to their lighting and viewpoint independence. This has also been enabled by the wider availability of 3D range scanners.
Most of the existing methods for facial feature detection and person recognition assume frontal and neutral views only, and hence biometry systems have been adapted accordingly. However, this may be uncomfortable for the subjects and limit the application domains. Therefore, the newly emerging goal in this field is to develop algorithms working with natural and uncontrolled behaviour of subjects. A robust identification system can also cope with the subjects who try to eschew being recognized by posing awkwardly and worse still, by resorting to occlusions via dangling hair, eyeglasses, facial hair and other accessories.
Once the expression is recognized, this information can also be used to help the person identifier. Various databases for 3D face recognition and occasionally 3D expression analysis are available.
Most of them are focused on recognition; hence contain a limited range of expressions and head poses. Also, none of them contain face occlusions. Every subject displays four intensity levels of the six emotions. Table I lists publicly available databases of relevance and compares with our database. List of some well known 3D face databases. Total Expression Pose Occl. Anger, happiness, FRGC v.
Finally in order to achieve more natural looking expressions, we have employed actors and actresses from professional theatres, opera and the conservatory school. The content of the database is given in Section 2, and data acquisition is explained in Section 3.
In Section 4 the acquired data are evaluated. Finally conclusion is given in Section 5. Many of the male subjects have facial hair like beard and moustache. The majority of the subjects are aged between 25 and There are 51 men and 30 women in total, and most of the subjects are Caucasian.
There are total of face scans. Each scan has been manually labelled for 24 facial landmark points such as nose tip, inner eye corners, etc, provided that they are visible in the given scan.
These feature points are given in Table II. The database has two versions: Bosphorus v. Bosphorus v. There are 47 people with 53 different face scans per subject. Totally there are 34 expressions, 13 poses, four occlusions and one or two neutral faces. In addition, Bosphorus v. In the following subsections, the collected facial expressions, head poses and occlusions are explained.
AUs are assumed to be building blocks of expressions, and thus they can give broad basis for facial expressions. Since each action unit is related with the activation of a distinct set of muscles, they can be assessed quite objectively. Although there are 44 AUs in general, we have collected a subset which consists of those AUs that are easier to enact. In the second set, facial expressions corresponding to certain emotional expressions were collected.
These are: happiness, surprise, fear, sadness, anger and disgust. It is stated that these expressions are universal among human races . During acquisition of each action unit, subjects were given explications about these expressions and they were given feedback if they did not enact correctly.
Also to facilitate the instructions, a video clip showing the correct facial motion for the corresponding action unit is displayed on the monitor . However, in the case of emotional expressions, there were no video or photo guidelines so that subjects tried to improvise. Only if they were able to enact, they were told to mimic the expression in a recorded video.
Moreover, a mirror was placed in front of the subjects in order to let them check themselves. In Table III. Also, Fig. These facial images are rendered with texture mapping and synthetic lighting. Texture mapping and synthetic lighting is applied for rendering.
Expressions in the Bosphorus database. A sample image for each expression is shown at the bottom part. There are three types of head poses which correspond to seven yaw angles, four pitch angles, and two cross rotations which incorporate both yaw and pitch.
For the yaw rotations, subjects align themselves by rotating the chair on which they sit to align with stripes placed on the floor corresponding to various angles. For pitch and cross rotations, the subjects are required to look at marks placed on the walls by turning their heads only i. Thus, a coarse approximation of rotation angles can be obtained. Second, for the eyeglass occlusion, subjects used different eyeglasses from a pool.
Head poses and occlusions in the Bosphorus database. It is able to capture a face in less than a second. Subjects are made to sit at a distance of about 1. A W halogen lamp was used in a dark room to obtain homogeneous lighting. This does not only affect the texture image of the face but can also cause noise in the 3D data. Scanner software is used for acquisition and 3D model reconstruction. The reconstruction from the acquired image data is performed during the acquisition session right after the scanning.
Although somewhat time consuming, it guarantees that faulty acquisitions are detected and hence can be repeated. In this phase data is also segmented manually by selecting a polygonal face region. In order to remove noise, several basic filtering operations like Gaussian and Median filtering are applied.
A segmented 3D face approximately consists of 35K points. In the sequel, we will discuss the pose, expression and occlusion modalities of the face as well as evaluating its quality aspects. This database contains great amount of variations for each individual due to expressions, head poses and occlusions, as explained in Section 2. Important characteristics of these variations are discussed below.
Not all subjects could properly produce all AUs, some of them were not able to activate related muscles or they could not control them. Therefore, in the database few expressions are not available for some of the subjects. Also, the captured AUs need to be validated by trained AU experts.
Second, since no video acquisition was possible for this database, the AUs were captured at their peak intensity levels, which were judged subjectively.
Notice that there was no explicit control for the valence of pronounced expressions. As in any other database, acted expressions are not spontaneous and thoroughly natural.
All these factors constitute the limitations of this database for expression studies. Although various angles of poses were acquired, they are only approximations.
Especially poses including pitch rotations can be subject dependent, since subjects are requested to look at marks placed in the environment.
This introduces slight angle differences due to the difference of rotation centres changing from subject to subject. Eye rotation may also cause some difference, though the subjects were warned in that case. The subject to subject variation of occlusions is more pronounced as compared to expression variations. For instance, while one subject occludes his mouth with the whole hand, another one may occlude it with one finger only; or hair occlusion on the forehead may vary a lot in tassel size and location.
Quality of the acquired data can be quite important depending on the application. Due to 3D digitizing system and setup conditions significant noise may occur. To reduce noise, we tried to optimize experimentally the acquisition setup by trying different lighting conditions and controlling the camera and subject distances.
However, there are other sources of problems. These are explained below. In the middle row, noise due to hair, movement, and facial hair is seen. At the bottom left, a mistake in the depth level of the tongue, and at the right, its correction is displayed.
A comfortable seat with a headrest was used to diminish the subject movements during long acquisition sessions. However, this problem can also happen for instance due to breathing or muscle contractions during expressions.
In the database, movement noise emerges especially in case of expressions, but depends on the subject and occasionally occurs. An example is shown at the middle row of Fig.
Data on hair and facial hair, such as beard and eyebrows, generally causes spiky noise. Spiky surfaces arise also over the eyes.
Basic smoothing filtering reduces these types of noises Fig. The consequences are holes in the facial data, and uncompleted and distorted facial contours. Holes are formed due to missing data, mostly at the sides of the nose. Even slight head rotations generate high amount of self occlusions. In Fig.
Bosphorus Database for 3D Face Analysis