FaceGen SDK Manual - In Depth

Statistical Appearance Models (SAMs)

We make use of a standard statistical technique called principal components analysis (PCA) to summarize the distribution of samples in a data set.

The basic idea is to represent a data set of shapes or images by a single mean (average) instance, plus a small number (ie much smaller than the number of samples in the data set) of PCA modes, which are deltas from the mean. We can then reconstruct any number of faces by linearly combining these PCA modes and adding them to the mean.

PCA modes have several useful properties which are essentially impossible to achieve using artist-created controls:

  1. The modes characterize the actual statistical distribution of the samples, allowing us to create controls which always look realistic.
  2. The modes are statistically independent. This means that picking random co-efficients for each mode according to a standard normal distribution (ie. a random face), gives us instances which are just as plausible as the others in our data set.
  3. The modes are optimal for reconstructing the samples in the least squares error sense. This means that we can achieve a high level of compression by storing faces just by their mode co-efficients.

Mathematically, the modes form the basis for our face space. The modes are statistically orthogonal and their magnitudes correspond to one standard deviation in our dataset.

The PCA modes are stored in our file formats in decreasing order of their magnitudes. Thus if you require a smaller number of basis vectors due to resource constraints, you retain the optimal least squares reconstruction property by removing modes from the end.

Statistical Shape Models (SSMs)

Shape is represented by the vertex positions of a polygonal model of fixed mesh topology. The mean model is a standard polygonal model consisting of triangular and/or quad facets. The modes are one standard deviation 3D displacements for each vertex.

The shape modes of a SSM preserve vertex-feature correspondence. For example, the vertex at the tip of the nose of the mean face will remain at the tip of the nose for any face constructed using the SSM.

FaceGen SSMs are statistically defined only over the face region. This area covers from about half-way up the forehead to just below where the chin meets the neck, and does not include the ears. Shape modification in areas outside of this face region are just extrapolations from the face region shape statistics. There are no specific controls for the ears, neck or back of the head.

Statistical Color Models (SCMs)

The mean color map is a 24-bit RGB color image. Each mode is stored as a signed RGB8 map, along with a floating point scaling factor. Vertex UV coordinates do not change.

Texture modes of an SCM preserve UV-feature correspondence. For example, the pixel at the corner of the eye will remain at the corner of the eye for all generated faces. The exception is eyebrows, which, because of their variable positioning on the face, are not always located at the same place in the texture map.

FaceGen SCMs are statistically defined only over the face region. Texture values in areas outside of the face region are extrapolated from the face region and are thus always skin-coloured.

You can modify the SCM mean image to add any features whose color will vary with the skin color, such as wrinkles or tatoos. Hair textures or other opaque objects whose color is not dependent on skin color should be composited on after creation of the final color map.

The SCM has been constrained to force the colour around the middle of the neck and below to be constant, for simpler integration.

Model Sets

Realistic changes in face shape inevitably affect the shape of the whole head; long thin faces have longer thinner heads, and short wide faces have shorter, wider heads. Thus modes that represent different faces must also include some morphing of the entire head.

This can make it difficult to add accessory models such as hairstyles or facial hair to a morphed head model. The solution is to have a separate SSM for each accessory model. If the same SSM co-efficients are applied to both the head model and its accessory models, they will always fit together seamlessly.

Similarly, a SAM can be generated for each level-of-detail model for the head, allowing the same face to be applied to any of the LOD models.

A head model plus its LOD and accessory models is referred to as a model set. The SDK includes the FaceGen default model set, but you can always add models or model sets using the 'fg3t' tools.

In general, the head model can also be broken into any number of SAMs, for instance if you want the eyes to be a separate model in FaceGen. This can be useful if you require a separate texture map for the eyes.

For example, the FaceGen Default model set consists of a low, medium and high resolution SAM for the skin and separate models for each eye. The Aqua model set, however has a SAM for the face area (including the eyes) and a SAM for the back of the head.

Detail Texture Modulation

After reconstruction of the color map using an SCM, the resulting image contains details which are well characterized statistically - such as the lip boundaries - because they are uniquely identifiable across every face.

Other details, such as wrinkles and skin texture, are not uniquely identifiable across every face, and these get smoothed out by our statistical approach. In order to put them back in, we use the concept of a detail texture.

A detail texture is a modulation map which is applied after reconstruction of the color map from the SCM. By taking this approach, the same detail texture can be applied to any reconstructed texture.

Modulation just means that each R, G and B component of each pixel of the statistical texture is multiplied by a factor defined by the respective pixel in the modulation map. Each such factor is coded as a single unsigned byte in the range [0,255] with the modulation factor calculated by dividing by 64.

Detail textures can be created artistically (see Modeller documentation), or by using the FaceGen PhotoFit, or you can use those provided with Modeller. In the PhotoFit method, after creating a 3D face to match an image, any remaining differences between the image and the computer's rendering of the image are used to create the detail texture.

The detail texture is only defined over the face area, which is the region from about half-way up the forehead, around in front of the ears, and down to the neck just below the jaw.

In an FG file the detail texture is stored in a cylindrically projected UV layout. The image transform file (FIM) for an SCM defines the transform of the detail texture from this layout to the UV layout of the particular mesh.

The size of the detail texture is determined by the size of the input photograph. To limit the size of the detail texture created by the PhotoFit, just limit the size of the input photograph.

File Storage

There is one SSM for each vertex list and one SCM for each texture. A SAM can consist of an SSM, an SCM, or both. A SAM consists of one or more of the following files:

Multiple SAMs composing a face, its parts, and/or its various level-of-detail and accessory models are referred to as a model set.

For example, the FaceGen default model set, which is stored in the 'csamDefault' directory, contains low, medium and high-resolution models of the skin, low and high-res eye models, tongue, teeth, sock, hairstyles and glasses. Only the skin and eye models have EGT and FIM files associated with them since the texture images of the other models are fixed and have no texture statistics.

Controls Data

The distribution data characterizes the variation in facial appearance within each racial group. The controls data characterizes the effects of the many parametric controls offered by FaceGen, including age, race and gender. The distribution and controls data is stored in the file 'si.ctl'.

SAM Definitions

A SSM consists of:

vi           The mean model i'th vertex position (a 3D position vector).
vji           The i'th vertex displacement of the j'th symmetric shape mode (a 3D displacement vector).
uki           The i'th vertex displacement of the k'th asymmetric shape mode (a 3D displacement vector).

And given a set of shape co-efficients:

sj            The j'th symmetric shape mode coefficient.
ak            The k'th asymmetric shape mode coefficient.
\( \mathit{N_{s}} \)            The number of symmetric shape modes (currently 50).
\( \mathit{N_{a}} \)            The number of asymmetric shape modes (currently 30).

We can build the specific head defined by the shape co-efficients as: \[ \mathbf{v_{i}^{'}=\bar{v_{i}}+\sum_{j=1}^{N_{s}}}(\mathit{s_{j}}\mathbf{v_{i}^{j}})+\mathbf{\sum_{k=1}^{N_{a}}}(\mathit{a_{k}}\mathbf{u_{i}^{k}}) \]

The idea for a STM is mathematically identical, we just re-define our variables as:

vi            The mean texture i'th pixel value (a 3-component color value).
vji            The i'th pixel value of the j'th symmetric texture mode (a 3-component color value).
uki            The i'th pixel value of the k'th asymmetric texture mode (a 3-component color value).

And so on.

We refer to the set of co-efficients for both symmetric and asymmetric, shape and texture as a coordinate vector, p, definining a point in face space.

Freeform Deformations

Freeform deformations are defined on an SSM.

The input for a freeform deformation control is a vertex index, \( \mathit{f} \), and a deformation vector,\( \mathbf{f_{\mathit{f}}} \).
A symmetric face space delta is then given by:

\[ d_{j}= \sqrt{\frac{I}{\sum_{i=1}^{I}\left \| \mathbf{v}_{i}^{j} \right \|^{2}}}\mathbf{f}_{f}\cdot \mathbf{v}_{f}^{j}\]

Where I is the number of vertices in the SSM.

The face coordinate is then modified by:

\[ d_{j}^{'}=s_{j}+d_{j}\]

Similarly, the asymmetric face space delta is given by:

\[ d_{k}= \sqrt{\frac{I}{\sum_{i=1}^{I}\left \| \mathbf{u}_{i}^{k} \right \|^{2}}}\mathbf{f}_{f}\cdot \mathbf{u}_{f}^{k}\]

And the face coordinate is modified by:

\[ a_{k}^{'}=a_{k}+d_{k}\]

Linear Controls

The shape and texture controls (both symmetric and asymmetric) demonstrated in the Modeller program in their respective tabs are linear controls. Each linear control is defined as a direction vector, c, within a subspace of shape or texture, symmetric or asymmetric, of unit magnitude,

\[ \left \| \mathbf{c_{i}} \right \|^{2}=1 \]

Given a set of linear controls, ci, in this sub-space, and a face coordinate sub-vector in this sub-space, p, the linear control value is given by:

\[ c_{i}=\mathbf{c_{i}}\cdot p \]

Modification of a face to have a desired value c'i for this linear control us given by:

\[ \mathbf{p^{'}=p+\Delta c} \]

Where

\[ \Delta = c^{'}_{i}+c_{i}\]

So now when we measure the linear control value of p' we get:

\[ c^{'}_{i}=\mathbf{c_{i}\cdotp^{'}=c_{i}\cdot(p+\Delta c_{i})}=c_{i}+\Delta \]

As expected.

Note that since the ci are not orthogonal, this change of p will affect all values ci.

Offset Linear Controls (Age and Gender)

Age and gender are offset linear controls defined within each racial distribution, although currently identical for each race. Offset linear controls, aj , are similar to linear controls, with a scalar offset, Oj, defined separately in the symmetric shape and symmetric texture sub-spaces, and with an arbitrary magnitude. They do not have an asymmetric component.

Given a sample face coordinate, p, in a symmetric sub-space, the offset linear control value is given by: \[ \mathit{a_{j}}=\mathbf{a_{j}}\cdot\mathbf{p}+\mathit{o_{j}} \]

Age is given in years. Gender of -1 corresponds to males and +1 to females.

And to modify the face coordinate to have a given offset linear control value c'j:

\[ \mathbf{p^{'}=p+\Delta a_{j}} \]

Where:

\[ \Delta=\frac{\mathit{a_{j}^{'}-a_{j}}}{\left \|\mathbf{a_{j}} \right \|^{2}} \]

Again note that the aj are not orthogonal so changing one will affect the others. In order to simultaneously set both age and gender to a desired value, use:

\[ \mathbf{p^{'}=p+\sum{(\Delta_{i}a_{i})}} \]

Where:

\[ \Delta_{i}=\sum_{j}\left [ \mathit{M_{{\ddot{v}}}^{-1}(a_{j}^{'}-a_{j})} \right ] \]

Where:

\[ \mathit{M_{\ddot{v}}}= \mathbf{a_{i} \cdot a_{j}} \]

Race Controls

The race controls are offset linear controls defined in the combined symmetric space (shape and texture) by the difference in racial mean positions in face space. Their use is as described above except that there is no partition into shape and texture sub-vectors.

Asymmetry

Given the asymmetric sub-vector of a face, p, with dimensionality \( N_{a} \), the asymmetry measure is given by:

\[ \mathit{a}=\frac{\left \| \mathbf{p} \right \|}{\sqrt{\mathit{N_{a}}}} \]

To change the asymmetry value to a new value, \( \mathit{a^{'}} \) , the new sub-vector is given by:

\[ \mathbf{p^{'}}=\mathit{\frac{a^{'}}{a}}\mathbf{p} \]

The asymmetry controls are independent of the offset linear controls.

An asymmetry value of 1 is the average, and most people will have values fairly close to 1. 0 represents a perfectly symmetric face.

Caricature

Unlike symmetry, caricature is defined on a per-race basis. Caricature is a measure of how close a sample is to the average face for a given race. The average caricature value for samples of that race is 1. Most faces have a caricature value close to 1. 0 represents the average face for that race.

Caricature is defined as the Mahalanobis distance of the symmetric sub-vector from the racial mean within the racial distribution, and is defined separately for geometry and texture components:

\[ \mathit{c}=\left \| \mathbf{q} \right \| \]

Where:

\[ \mathbf{q=M_{p}(p-\mu_{p})} \]

Where Mp is either the geometry-geometry partition or the texture-texture partition of the racial Mahalanobis transform:

\[ \mathbf{M=\begin{bmatrix} M_{GG} M_{GT}\\ M_{TG} M_{TT}\\ \end{bmatrix}} \]

And μp is the corresponding sub-vector of the racial mean:

\[ \mathbf{\mu=\begin{bmatrix} \mu_{G}\\ \mu_{T}\\ \end{bmatrix}} \]

To modify p to have a desired caricature value c':

\[ \mathbf{p^{'}}=\mathit{\frac{c^{'}}{c}}\mathbf{(p-\mu)+\mu} \]

Attribute-Neutral Caricature

It is often desirable not to exaggerate age and gender characteristics as part of the caricature measure. To achieve this, we proceed as above but limited to the subspace independent of these attributes. In this case we calculate:

\[ \mathit{c}=\frac{\left \| \mathbf{q^{''}} \right \|}{\mathit{\sqrt{N_{p}-2}}} \]

Where Np is the dimensionality of the sub-vector and:

\[ \mathbf{q^{''}=M_{p}(p^{''}-\mu_{p})} \]

Where:

\[ \mathbf{p^{''}=p-\Delta_{p}} \]

Where:

\[ \mathbf{\Delta_{p}=\sum_{i}\left [ \hat{b}_{i}(p\cdot \hat{b}_{i}) \right ]} \]

Where \( \mathbf{\hat{b}_{i}} \) are the orthogonal basis vectors of the attribute subspace, which in the case of age and gender can be calculated using Gram-Schmidt orthogonalization from their axes:

\( \mathbf{\hat{b}_{1}=\hat{a}_{g}} \)

\( \mathbf{\hat{b}_{2}=\frac{r}{\left \| r \right \|}} \)

Where

\( \mathbf{r=\hat{x}_{a}-(\hat{x}_{a} \cdot \hat{x}_{g})\hat{x}_{g}} \)

Given a point in face space we can set the caricature to a new value c' by:

\( \mathbf{p}=\frac{c^{'}}{c}\mathbf{(p-\Delta_{p})+\Delta_{p}} \)     or

\( \mathbf{p}=\frac{c^{'}}{c} \mathbf{p}+(1-\frac{c^{'}}{c}) \mathbf{\Delta_{p}} \)

This control is independent of age and gender.

Random Generate

To generate the symmetric component, \( \mathbf{{p}}_{s} \), of a random face from a racial distribution, we generate \( {N}_{s} \) random values, \( \mathbf{{q}}_{s} \), from a standard normal distribution and transform from the Mahalanobis space of the distribution to face space:

\[ \mathbf{{p}_{s}=M^{-1}q_{s}+\mu} \]

Note that this is done in the combined symmetric space - including both geometry and texture dimensions.

The asymmetry distribution is just the face space basis (for all races) so we generate an asymmetric sub-vector by simply generating more random values from a standard normal distribution:

\[ \mathbf{{p}_{a}=q_{a}} \]

Note that the gender distribution modelled in this manner is unimodal, wheras we expect the real gender distribution to be bimodal. As we do not currently model male and female distributions separately, the best way to simulate this, if desired, is to set the gender attribute according to a bimodal distribution, as described above.

Likewise, it can also be useful to set age, caricature and/or asymmetry to a previously fixed value, as described above.

Morphing

Stat morphs are applied to a SAM-constructed model by:

\[ \mathbf{v}_{i}^{T}=\mathbf{v}_{i}^{'}+\sum_{j}\left [ t_{j}(\mathbf{t}_{i}^{'j}-\mathbf{v}_{i}^{'}) \right ] \]

Where \( \mathbf{t_{i}^{'j}} \) is the SSM-transformed target position for the i'th vertex for the j'th stat morph target and \( t_{j} \) is the j'th stat morph co-efficient.

Diff morphs are then applied to a SAM-constructed model by:

\[ \mathbf{v}_{i}^{D}=\mathbf{v}_{i}^{T}+\sum_{j}(d_{j}\mathbf{d}_{i}^{j})\]

Where \( \mathbf{d}_{i}^{j} \) is the displacement vector of the i'th vertex of the j'th diff morph and \( d_{j} \) is the j'th diff morph co-efficient.