I was reading about pre-integrated skin shading and screen-space subsurface scattering in the associated GPU Pro 360 books, which got me really interested in learning about ways to represent subsurface scattering—a type of diffuse lighting—in a rendering pipeline. I came across another article on realistic real-time skin rendering on mobile, and it occurred to me that what all these recent techniques have in common is an impetus toward optimizing rendering times by simplifying the determination of subsurface scattering that can put a heavy load on your frame rates. Indeed, pre-integrated and screen-space methods are meant to be evolutions of the canonical texture-space diffusion workflow as first described in an article in GPU Gems 3.

**Kelemen/Szirmay-Kalos Specular Surface Reflectance**

Returning to the brass tacks, I wanted to fully flesh out and implement a skin rendering mode for my renderer to get a feel for how to effectively integrate data on skin color in a manner that allows us to experience physically plausible results in real-time. The basis for the specular reflectance of these “skin materials” is a precomputed texture that allows us to calculate in real-time the Kelemen/Szirmay-Kalos specular BRDF and subsequent reflectance value. That’s not terribly interesting because we’re essentially distilling much of the BRDF calculation into something that looks like this:

Obviously, it’s a hell of a lot faster to draw our non-Fresnel components from this than to compute them in real-time.

There’s an additional dependency that we need to feed into the specular calculation function that simultaneously acts as a roughness parameter (m) that we use to query into the Beckmann texture shown above, as well as a light intensity modulation factor that scales the result of our function. It looks something like this:

I suppose this requires a bit of an explanation. What this allows us to do is vary the specular parameters over our head model. The reason for this odd-looking texture is that it’s based on a study by Weyrich (2006) that measured roughness and intensity modulation factors for ten distinct regions of the face, using a sample size of 149 face images. Hence, for each region of our face, we can apply specific **m **and **rho_s** values when calculating the exact specular reflectance at a given pixel. Using a point light, the result is going to look something like this:

For reference, this is the complete determination of specular lighting in our demo:

const vec3 T = normalize(dFdx(fs_in.WorldPos) * dFdy(fs_in.TexCoords).t - dFdy(fs_in.WorldPos) * dFdx(fs_in.TexCoords).t); vec3 calcWorldSpaceNormal(vec3 tangentSpaceNormal) { vec3 N = normalize(fs_in.WorldNormal); vec3 B = -normalize(cross(N, T)); mat3 TBN = mat3(T, B, N); return normalize(TBN * tangentSpaceNormal); } const vec3 TANGENT_SPACE_N = texture(gNormalMap, fs_in.TexCoords).xyz * 2.0f - 1.0f; const vec3 N = calcWorldSpaceNormal(TANGENT_SPACE_N); const vec3 L = normalize(gLightPos - fs_in.WorldPos); const vec3 V = normalize(gViewPos - fs_in.WorldPos); const vec3 LIGHT_SPACE_POS_POST_W = fs_in.LightSpacePos.xyz / fs_in.LightSpacePos.w * 0.5f + 0.5f; const vec2 FLIPPED_UV = vec2(fs_in.TexCoords.x, 1.0f - fs_in.TexCoords.y); float fresnelReflectance(float cosTheta, float F0) { cosTheta = 1.0f - cosTheta; cosTheta = pow(cosTheta, 5); return cosTheta + (1.0f - cosTheta) * F0; } float KS_Skin_Specular(float m, float rho_s) { float result = 0.0f; float NoL = dot(N, L); if (NoL > 0.0f) { vec3 h = L + V; vec3 H = normalize(h); float NoH = dot(N, H); float PH = pow(2.0f * texture(gBeckmannTexture, vec2(NoH, m)).r, 10); float cosTheta = dot(H, V); float F = fresnelReflectance(cosTheta, 0.028f); float frSpec = max(PH * F / dot(h, h), 0); result = NoL * rho_s * frSpec; // BRDF * dot(N,L) * rho_s } return result; }

**Texture Space Diffusion and Subsurface Scattering**

It’s easy to see that the dependencies I talked about aren’t too constraining on the performance as the required textures are easy to precompute (or you could just pull them from a known source as I did). On the other hand, putting together a workflow to deliver diffuse lighting values per pixel was quite the exercise in organizing collections of render targets. I’m not going to go too deep into the theory behind subsurface scattering—these two resources do a good job of contextualizing the fundamentals and details behind texture-space diffusion and the subsequent application of diffusion profiles. However, I did want to share my own implementation just to lend some insight into how one might generate the requisite inputs for the diffuse lighting calculations.

What *is *important to understand is that our modeling of subsurface scattering generates a set of experimentally determined coefficients (i.e., Gaussian weights) that when combined with different convolutions of our irradiance texture, gives us different aspects or contributions toward the aggregate amount of diffuse lighting. We also utilize a subset of our blurred irradiance textures in combination with stored UV coordinates in our modified translucent shadow map to estimate additional global transmitted light (i.e., light transmitted completely through thin regions such as ears and nostrils) that texture-space diffusion would fail to capture. A full analysis of this secondary extrapolation of diffuse illumination can be found in GPU Gems 3, but the justification for it is pretty simple. Surface locations that are very close in three-dimensional space can be quite distant in texture-space; thus, we estimate diffusion through thin regions by reusing a subset of our convolved irradiance textures.

We can’t simply use a canonical shadow mapper to distill the needed UVs. But we can, thankfully, feed the necessary information to our final render shader, which requires only a small adjustment to our current shadow mapping implementation:

#version 460 core layout (location = 0) out vec3 ShadowMap; in vec2 TexCoords; void main() { ShadowMap.x = gl_FragCoord.z; ShadowMap.y = TexCoords.x; ShadowMap.z = TexCoords.y; }

In addition to the translucent shadow map, we need to generate the stretch and irradiance maps to address several problems that would otherwise occur. The irradiance map and its convolutions (which I’ll show in just a bit) are important because what we’re trying to do is simulate diffusion by blurring the light map (i.e., irradiance) texture one convolution at a time. Next, we take each convolved texture and apply a set of coefficients that differs depending on the number of convolutions that have been applied to that texture (on a per-pixel basis and when calculating the diffuse term). The initial, unconvolved irradiance map is technically considered the first irradiance convolution in our render shader, so we’ll only need five convolutions or blur passes to generate everything we need.

The purpose of the stretch texture is to correct for UV distortion and surface curvature, which it does by modulating the spacing of convolution taps (i.e., samplings) for each location in texture space. Not allowing for this correction can result in artifacts that manifest as a red glow around shadows, waxy-looking diffusion in certain regions, and so forth. The GPU Gems article provides a more complete description of the role that the stretch map plays in the overall implementation.

In addition to the correction aspect, subsequent convolutions of the stretch texture can also optionally serve as inputs for the shader that outputs convolutions of our irradiance map (in place of the normal, unconvolved stretch map). Applying the convolved stretch maps is generally recommended to produce more accurate irradiance map convolutions, and both of our resources provide further explanation of the use of the *k*th stretch map convolution to correct the *k*th convolution of our irradiance texture.

**Setting Up the Framebuffers**

There’s likely more than one way to facilitate a complete generation of the aforementioned textures, but I opted to set up 12 separate framebuffers: one for our shadow map, one for the render targets onto which we generate the initial stretch and irradiance maps, five for the blurred stretch maps, and five for the blurred irradiance maps.

glBindFramebuffer(GL_FRAMEBUFFER, shadowMapRenderTarget->_FBO); glDrawBuffer(GL_COLOR_ATTACHMENT0); glBindFramebuffer(GL_FRAMEBUFFER, textureSpaceInputs->_FBO); irradianceMap = textureSpaceInputs->genAttachment(GL_RGBA32F, GL_RGBA, GL_FLOAT); GLenum bufs[2] = { GL_COLOR_ATTACHMENT0, GL_COLOR_ATTACHMENT1 }; glDrawBuffers(2, bufs); for (size_t i = 0; i < NUM_BLUR_PASSES; ++i) { glBindFramebuffer(GL_FRAMEBUFFER, blurStretchPassesRenderTargets[i]->_FBO); blurredStretchMaps[i] = blurStretchPassesRenderTargets[i]->genAttachment(GL_RGBA32F, GL_RGBA, GL_FLOAT); glDrawBuffers(2, bufs); } for (size_t i = 0; i < NUM_BLUR_PASSES; ++i) { glBindFramebuffer(GL_FRAMEBUFFER, blurPassesRenderTargets[i]->_FBO); blurredIrradianceMaps[i] = blurPassesRenderTargets[i]->genAttachment(GL_RGBA32F, GL_RGBA, GL_FLOAT); glDrawBuffers(2, bufs); }

I should probably mention that in my engine, I initialize a color attachment by default within the constructor of a framebuffer object (FBO). Hence, the default render target is obviously the shadow map in the **shadowMapRenderTarget** FBO. For the **textureSpaceInputs** FBO, the default render target is where the stretch map is written, and for the two sets of framebuffers for our blurred maps, the default render target for each FBO stores the U-convolution of the two-pass blur. My strategy here was to have each blur-pass write to an intermediate render target before capturing the result from the V-convolution shader onto a render target that maps to a list of texture IDs that we will use in our render shader. This is pretty much the ping-pong FBO idea that’s commonly used for separable filters.

Looking at the big picture, this is how we’re going to want to execute the overall texture generation:

void execTextureSpaceInputsPass(const phoenix::Shader& shader, phoenix::Model& object) { glViewport(0, 0, phoenix::HIGH_RES_WIDTH, phoenix::HIGH_RES_HEIGHT); glBindFramebuffer(GL_FRAMEBUFFER, textureSpaceInputs->_FBO); glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT); glActiveTexture(GL_TEXTURE0); glBindTexture(GL_TEXTURE_2D, shadowCommon->_objectTexture); shadowCommon->renderObject(utils, shader, object, TRANSLATION, ROTATION, SCALE); glBindFramebuffer(GL_FRAMEBUFFER, 0); } void execBlurPasses(const phoenix::Shader& convolveStretchUShader, const phoenix::Shader& convolveStretchVShader, const phoenix::Shader& convolveUShader, const phoenix::Shader& convolveVShader) { glViewport(0, 0, phoenix::HIGH_RES_WIDTH, phoenix::HIGH_RES_HEIGHT); for (size_t i = 0; i < NUM_BLUR_PASSES; ++i) { glBindFramebuffer(GL_FRAMEBUFFER, blurStretchPassesRenderTargets[i]->_FBO); glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT); convolveStretchUShader.use(); glActiveTexture(GL_TEXTURE0); glBindTexture(GL_TEXTURE_2D, i > 0 ? blurredStretchMaps[i - 1] : textureSpaceInputs->_textureID); utils->renderQuad(convolveStretchUShader); convolveStretchVShader.use(); glActiveTexture(GL_TEXTURE0); glBindTexture(GL_TEXTURE_2D, blurStretchPassesRenderTargets[i]->_textureID); utils->renderQuad(convolveStretchVShader); glBindFramebuffer(GL_FRAMEBUFFER, blurPassesRenderTargets[i]->_FBO); glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT); convolveUShader.use(); glActiveTexture(GL_TEXTURE0); glBindTexture(GL_TEXTURE_2D, i > 0 ? blurredIrradianceMaps[i - 1] : irradianceMap); glActiveTexture(GL_TEXTURE1); glBindTexture(GL_TEXTURE_2D, blurredStretchMaps[i]); utils->renderQuad(convolveUShader); convolveVShader.use(); glActiveTexture(GL_TEXTURE0); glBindTexture(GL_TEXTURE_2D, blurPassesRenderTargets[i]->_textureID); glActiveTexture(GL_TEXTURE1); glBindTexture(GL_TEXTURE_2D, blurredStretchMaps[i]); utils->renderQuad(convolveVShader); } glBindFramebuffer(GL_FRAMEBUFFER, 0); }

There’s nothing really special about generating the initial, unblurred texture-space maps, which we’re going to want to do after the shadow map-pass in the main game loop because we’re pretty much just rendering an object and having our fragment shader map the necessary information onto a pair of render targets:

#version 460 core layout (location = 0) in vec3 gPos; layout (location = 1) in vec2 gTexCoords; out vec3 WorldPos; out vec2 TexCoords; uniform mat4 gWorldMatrix; void main() { WorldPos = vec3(gWorldMatrix * vec4(gPos, 1.0f)); TexCoords = vec2(gTexCoords.x * 2.0f - 1.0f, (1.0f - gTexCoords.y) * 2.0f - 1.0f); gl_Position = vec4(TexCoords, 0.0f, 1.0f); }

#version 460 core layout (location = 0) out vec4 StretchMap; layout (location = 1) out vec4 IrradianceMap; const float SCALE = 0.0003f; in vec3 WorldPos; in vec2 TexCoords; uniform sampler2D gDiffuseTexture; vec2 computeStretchMap() { vec3 derivu = dFdx(WorldPos); vec3 derivv = dFdy(WorldPos); float stretchU = SCALE / length(derivu); float stretchV = SCALE / length(derivv); return vec2(stretchU, stretchV); } void main() { StretchMap = vec4(computeStretchMap(), 0.0f, 1.0f); vec2 texCoords = TexCoords * 0.5f + 0.5f; texCoords.y = 1.0f - texCoords.y; IrradianceMap = texture(gDiffuseTexture, texCoords); }

If executed correctly, the initial texture-space map generation shaders should give us these:

One small thing to note is that you might want to play around with the **SCALE **factor just to make sure that as many of the stretch values as possible map to the [0, 1] range.

**Texture-Space Convolution Passes**

For the blur-passes, the basic idea is to perform a horizontal blur of our irradiance texture and store the result in an intermediate render target in the corresponding framebuffer. Then, we perform a vertical blur of that intermediate convolution and store the resulting blurred irradiance map in a different render target that we’ll access from our final shader. Subsequent blur passes will operate using the blurred irradiance map from the previous blur-pass as an input as opposed to the initial, unblurred irradiance texture. Here is the pair of fragment shaders that achieves our desired effect given some canonical texture or quad read vertex shader:

#version 460 core layout (location = 0) out vec4 BlurredIrradianceMap; const float GAUSS_WIDTH = 15.0f; in vec2 TexCoords; uniform sampler2D gIrradianceMap; uniform sampler2D gStretchMap; vec4 convolveU() { float scaleConv = 1.0f / 1024.0f; vec4 stretch = texture(gStretchMap, TexCoords); float netFilterWidth = scaleConv * GAUSS_WIDTH * stretch.x; // Gaussian curve - standard deviation of 1.0 float curve[7] = {0.006f, 0.061f, 0.242f, 0.383f, 0.242f, 0.061f, 0.006f}; vec2 coords = TexCoords - vec2(netFilterWidth * 3.0f, 0.0f); vec4 sum = vec4(0.0f); for (int i = 0; i < 7; ++i) { vec4 tap = texture(gIrradianceMap, coords); sum += curve[i] * tap; coords += vec2(netFilterWidth, 0.0f); } return sum; } void main() { BlurredIrradianceMap = convolveU(); }

#version 460 core layout (location = 1) out vec4 BlurredIrradianceMap; const float GAUSS_WIDTH = 15.0f; in vec2 TexCoords; uniform sampler2D gIrradianceMap; uniform sampler2D gStretchMap; vec4 convolveV() { float scaleConv = 1.0f / 1024.0f; vec4 stretch = texture(gStretchMap, TexCoords); float netFilterWidth = scaleConv * GAUSS_WIDTH * stretch.y; // Gaussian curve - standard deviation of 1.0 float curve[7] = {0.006f, 0.061f, 0.242f, 0.383f, 0.242f, 0.061f, 0.006f}; vec2 coords = TexCoords - vec2(0.0f, netFilterWidth * 3.0f); vec4 sum = vec4(0.0f); for (int i = 0; i < 7; ++i) { vec4 tap = texture(gIrradianceMap, coords); sum += curve[i] * tap; coords += vec2(0.0f, netFilterWidth); } return sum; } void main() { BlurredIrradianceMap = convolveV(); }

As stated before, we’re going to want to blur the stretch map in a similar, incremental fashion before performing the blurs on the same level or iteration of our irradiance texture because we will be using the resulting blurred stretch map to calculate the corresponding blurred irradiance convolution. The shaders for this are going to look similar to what we have for the irradiance convolutions except that we’ll sample from the input stretch map instead:

#version 460 core layout (location = 0) out vec4 BlurredStretchMap; const float GAUSS_WIDTH = 15.0f; in vec2 TexCoords; uniform sampler2D gStretchMap; vec4 convolveU() { float scaleConv = 1.0f / 1024.0f; vec4 stretch = texture(gStretchMap, TexCoords); float netFilterWidth = scaleConv * GAUSS_WIDTH * stretch.x; // Gaussian curve - standard deviation of 1.0 float curve[7] = {0.006f, 0.061f, 0.242f, 0.383f, 0.242f, 0.061f, 0.006f}; vec2 coords = TexCoords - vec2(netFilterWidth * 3.0f, 0.0f); vec4 sum = vec4(0.0f); for (int i = 0; i < 7; ++i) { vec4 tap = texture(gStretchMap, coords); sum += curve[i] * tap; coords += vec2(netFilterWidth, 0.0f); } return sum; } void main() { BlurredStretchMap = convolveU(); }

#version 460 core layout (location = 1) out vec4 BlurredStretchMap; const float GAUSS_WIDTH = 15.0f; in vec2 TexCoords; uniform sampler2D gStretchMap; vec4 convolveV() { float scaleConv = 1.0f / 1024.0f; vec4 stretch = texture(gStretchMap, TexCoords); float netFilterWidth = scaleConv * GAUSS_WIDTH * stretch.y; // Gaussian curve - standard deviation of 1.0 float curve[7] = {0.006f, 0.061f, 0.242f, 0.383f, 0.242f, 0.061f, 0.006f}; vec2 coords = TexCoords - vec2(0.0f, netFilterWidth * 3.0f); vec4 sum = vec4(0.0f); for (int i = 0; i < 7; ++i) { vec4 tap = texture(gStretchMap, coords); sum += curve[i] * tap; coords += vec2(0.0f, netFilterWidth); } return sum; } void main() { BlurredStretchMap = convolveV(); }

Just to get a sense of texture-space diffusion at work, here’s what we can expect from our suite of convolution shaders:

It’s also possible to perform texture-space diffusion without incorporating the diffuse color information directly but instead normalizing the diffusion profiles to a white color and focusing our calibrations on the individual color channels (i.e., focusing on diffusing the white or diffuse light only). The relevant chapter in GPU Gems 3 compares our approach (i.e., pre-scatter texturing), which convolves the diffuse color data as part of the series of convolutions, to post-scatter texturing, which applies the diffuse color map after executing the scattering computations. It’s a bit of a wash as to which option leads to the better result, but it’s always good to keep alternative options in mind should we get curious about ways to improve the final render.

**Putting It All Together**

That about does it for the hard parts of the overall skin rendering technique, which mostly center on the concept of texture-space diffusion. The rest of the implementation combines these various dependencies in a somewhat intricate, physically based manner that you can get a better feel for by checking out the associated GPU Gems resource. It does seem that the diffuse lighting calculations take up most of the workflow; subsurface scattering is just so crucial to the experience of viewing skin. To give a better perspective of that assertion, one only needs to consider the remainder of the render shader and its result:

#version 460 core out vec4 FragColor; const float MIX = 0.5f; const float SPECULAR_CORRECTION_FACTOR = 50.0f; const float DIFFUSE_CORRECTION_FACTOR = 1.4f; const int NUM_CONVOLUTIONS = 6; const vec3 GAUSSIAN_WEIGHTS[NUM_CONVOLUTIONS] = vec3[] (vec3(0.233f, 0.455f, 0.649f), vec3(0.1f, 0.336f, 0.344f), vec3(0.118f, 0.198f, 0.0f), vec3(0.113f, 0.007f, 0.007f), vec3(0.358f, 0.004f, 0.0f), vec3(0.078f, 0.0f, 0.0f)); in VS_OUT { vec3 WorldPos; vec3 WorldNormal; vec2 TexCoords; vec4 LightSpacePos; } fs_in; uniform sampler2D gDiffuseTexture; uniform sampler2D gShadowMap; uniform sampler2D gNormalMap; uniform sampler2D gBeckmannTexture; uniform sampler2D gStretchMap; uniform sampler2D gSpecularTexture; uniform sampler2D gIrradianceMaps[NUM_CONVOLUTIONS]; uniform vec3 gLightPos; uniform vec3 gViewPos; uniform vec3 gLightColor; uniform float gAmbientFactor; uniform float gSpecularFactor; uniform int gRenderMode; // ... vec4 finalSkinShader() { // The total diffuse light exiting the surface vec3 diffuseLight = vec3(0.0f); vec4 irradTaps[NUM_CONVOLUTIONS]; for (int i = 0; i < NUM_CONVOLUTIONS; ++i) { irradTaps[i] = texture(gIrradianceMaps[i], FLIPPED_UV); diffuseLight += DIFFUSE_CORRECTION_FACTOR * GAUSSIAN_WEIGHTS[i] * irradTaps[i].xyz; } // Renormalize diffusion profiles to white vec3 normConst = GAUSSIAN_WEIGHTS[0] + GAUSSIAN_WEIGHTS[1] + GAUSSIAN_WEIGHTS[2] + GAUSSIAN_WEIGHTS[3] + GAUSSIAN_WEIGHTS[4] + GAUSSIAN_WEIGHTS[5]; diffuseLight /= normConst; // Renormalize to white diffuse light // Compute global scatter from modified TSM // TSMtap = (distance to light, u, v) vec3 TSMtap = texture(gShadowMap, LIGHT_SPACE_POS_POST_W.xy).xyz; // Four average thicknesses through the object (in mm) vec4 thickness_mm = 1.0f * -(1.0f / 0.2f) * log(vec4(irradTaps[1].w, irradTaps[2].w, irradTaps[3].w, irradTaps[4].w)); vec2 stretchTap = texture(gStretchMap, FLIPPED_UV).xy; float stretchval = 0.5f * (stretchTap.x + stretchTap.y); vec4 a_values = vec4(0.433f, 0.753f, 1.412f, 2.722f); vec4 inv_a = -1.0f / (2.0f * a_values * a_values); vec4 fades = exp(thickness_mm * thickness_mm * inv_a); float textureScale = 1024.0f * 0.1f / stretchval; float blendFactor4 = clamp(textureScale * length(FLIPPED_UV - TSMtap.yz) / (a_values.y * 0.6f), 0.0f, 1.0f); float blendFactor5 = clamp(textureScale * length(FLIPPED_UV - TSMtap.yz) / (a_values.z * 0.6f), 0.0f, 1.0f); float blendFactor6 = clamp(textureScale * length(FLIPPED_UV - TSMtap.yz) / (a_values.w * 0.6f), 0.0f, 1.0f); diffuseLight += GAUSSIAN_WEIGHTS[3] / normConst * fades.y * blendFactor4 * texture(gIrradianceMaps[3], TSMtap.yz).xyz; diffuseLight += GAUSSIAN_WEIGHTS[4] / normConst * fades.z * blendFactor5 * texture(gIrradianceMaps[4], TSMtap.yz).xyz; diffuseLight += GAUSSIAN_WEIGHTS[5] / normConst * fades.w * blendFactor6 * texture(gIrradianceMaps[5], TSMtap.yz).xyz; // Determine skin color from a diffuseColor map diffuseLight *= pow(texture(gDiffuseTexture, fs_in.TexCoords).rgb, vec3(1.0f - MIX)); vec4 specTap = texture(gSpecularTexture, FLIPPED_UV); // rho_s and m (roughness) float m = specTap.w * 0.09f + 0.23f; float rho_s = specTap.x * 0.16f + 0.18f; rho_s *= float(specTap.x > 0.1f); // Compute specular for each light vec3 specularLight = SPECULAR_CORRECTION_FACTOR * gLightColor * KS_Skin_Specular(m, rho_s); return vec4(diffuseLight + specularLight, 1.0f); }

The diffuse result from our subsurface scattering demo illustrates a greater smoothness and flush colors that really indicate a depth to what’s being shown. Obviously, texture authoring could improve, but simply modeling the light in a more physically based manner really does a lot to bring out these nuanced, subsurface interactions that add a lot of realism to what would normally be flat colors driven by the light. I mean, just take a look at the intermittent glow on those shoulders and cheeks! There’s a certain aliveness that I really can’t quite put into words.

**Resources**

The Advanced Techniques for Realistic Real-Time Skin Rendering Chapter in GPU Gems 3

S. James Lee’s Realistic Skin Rendering Resource