Faking mesh instancing in OpenGL ES 2.0

The first prototype of what became The Spatials was in full 3D. We wanted to do a game about planet exploration and colonization, with randomly generated maps. One of the things I wanted to do was put a lot of detail on the planet surface. Modern GPUs, including those in smartphones, are very powerful and properly optimized have enormous throughput.

Mesh instancing

Say that you want to repeat a mesh, many times, in a 3D scene. For example a grass mesh, or maybe some weird alien equivalent. To make it look nice you want to give each mesh its own transform you have complete freedom on where to locate or rotate it. Unfortunately changing the transform implies a draw call. In plain OpenGL ES 2.0 there is no way to draw a single mesh N times, with N different transforms, in a single draw call.

Other APIs (including big brother OpenGL since version 3.1, and ES since version 3.0) work around this issue by using a feature called instancing. There are various approaches but the basic idea is to have the hardware reissue the mesh list N times, while at the same time incrementing a value in the shader called gl_InstanceID. Then inside the vertex shader you can use that value to reference shader constants and read the required data to transform that particular mesh instance.

Fake it til you make it

OpenGL ES 2.0 doesn't have support for either gl_InstanceID or glDrawElementsInstanced. So we need to fake both.

glDrawElementsInstanced assists in reissuing your geometry N times, so we will just duplicate our mesh data N times in our vertex array. This is wastes memory so like all tradeoffs N will have to be choosen carefully. Larger values will allows us to batch more mesh draws in a single call but will require more memory.

For gl_InstanceID we can add a new attribute to our vertex format, or better yet, find an unused component and pass it there. In our prototype we used the W component of the normal. This value will hold the instance ID we will later use in the shader to load the right shader constants. It's just an integer repeated with the same value for all the vertex in a copy of the mesh. In pseudocode:

instancedMeshes = vertex[mesh.size * N];
for (int i = 0; i < N; i++) {
    for (int j = 0; j < mesh.size; j++) {
        vertex v = mesh[j];
        v.normal.w = i;
        instancedMeshes[i * mesh.size + j] = v;

We now have a new mesh, which contains N copies of our original mesh, and each vertex of each copy contains an integer in the w component of its normal that indicates its copy number. Now it's just a matter of properly building our shader and shader constants, and we will be able to draw our meshes in batches of N, with just a few calls per batch (one call to draw, and one or more calls to glUniform4fv to set up our per-instance data in shader constants).

Pseudocode for drawing:

// hypothetical 3D engine call that returns all the objects to be rendered
// using the current instanced mesh
world = getAllPendingInstancedObjects(...);

// from the previous pseudocode example
bindInstancedMeshes(instancedMeshes, ...);

GLfloat orientations[4 * N];
GLfloat positionsScales[4 * N];

bool remaining = true;
while (remaining) {

    int instance = 0;
    while (remaining && instance < N) {
        // this hypothetical method does 3 things:
        // writes the rotation quaternion of the current object instance to the first parameter
        // writes the position of the current object instance to the first parameter
        // returns if there are any remaining object instances to render
        remaining = world.next(&orientations[instance*4], &positionsScales[instance*4]);

    // we use instance as the counter to allow for the "tail" of the sequence to not reach N
    glUniform4fv(uniforms[U_INSTANCE_ORIENTATIONS], instance, orientations);
    glUniform4fv(uniforms[U_INSTANCE_POSITIONSSCALES], instance, positionsScales);
    // rangeSize is just the size of the non-instanced mesh
    glDrawArrays(GL_TRIANGLES, 0, instance * rangeSize);

In this pseudocode we are using quaternions instead of a full matrix to cut down the required shader constants to just two per instance. Now in the shader we just read the instance ID and use it to index the constant register file and extract the position and quaternion for our mesh instance, and transform the vertex normally.

attribute vec4 position;
attribute vec4 normal;
attribute vec2 tex0uv;

uniform mat4 modelViewProjectionMatrix;
uniform vec4 orientations[32];
uniform vec4 positionsScales[32];

varying lowp vec3 colorVarying;
varying lowp vec3 normalOut;
varying lowp vec2 tex0CoordOut;

// http://wiki.kri.googlecode.com/hg/Quaternions.wiki

//rotate vector
vec3 qrot(vec4 q, vec3 v)   {
    return v + 2.0*cross(q.xyz, cross(q.xyz,v) + q.w*v);

//rotate vector (alternative)
vec3 qrot_2(vec4 q, vec3 v) {
    return v*(q.w*q.w - dot(q.xyz,q.xyz)) + 2.0*q.xyz*dot(q.xyz,v) + 2.0*q.w*cross(q.xyz,v);

//combine quaternions
vec4 qmul(vec4 a, vec4 b)   {
    return vec4(cross(a.xyz,b.xyz) + a.xyz*b.w + b.xyz*a.w, a.w*b.w - dot(a.xyz,b.xyz));

//inverse quaternion
vec4 qinv(vec4 q)   {
    return vec4(-q.xyz,q.w);

void main()
    // pop out the instance ID from the unused normal component W
    int instance = int(normal.w);

    // and use it to read the xform values for the instance
    vec4 instancedOrientation = orientations[instance];
    vec4 instancedPositionScale = positionsScales[instance];

    // another trick: store an uniform scale in the W component of a position
    vec3 instancedPosition = instancedPositionScale.xyz;
    float instancedScale = instancedPositionScale.w;

    // discard normal W before using it
    vec3 realNormal = normal.xyz;

    // rotate the normal, this would be matrix operation, but we 
    // are using quaternions to optimize per-instance constant storage
    realNormal = qrot(instancedOrientation, realNormal);

    // input here your favorite per-vertex lighting formula
    colorVarying = ...;

    // xform our position by the scale, this is pre translation, remember
    vec3 workPosition = position.xyz * instancedScale;

    // xform now by the quaternion to rotate the model
    workPosition = qrot(instancedOrientation, workPosition);

    // final position, global MVP xforms the final translation, which is just the
    // previous modelspace xforms plus the instanced position
    gl_Position = modelViewProjectionMatrix * vec4(workPosition + instancedPosition, 1.0);

    tex0CoordOut = tex0uv;
    normalOut = realNormal;