Polygons & textures creeping in my raymarcher

This week I’ve been working on additional features in my 64k toolchain. None of this is yet viable for 64k executables but it enhances the tool quite a bit.

My first step was implementing vertex shader support. A cool thing about vertex shaders in openGL is that they are responsible for outputting the vertex data, nobody said anything about requiring input. So with a function like glDrawArraysInstanced, we have full reign in the vertex shader to generate points based on gl_VertexID and gl_InstanceID.

Here I’m generating a grid of 10×10 quads, added some barycentric coordinates as per This article

#version 420

uniform mat4 uV;
uniform mat4 uVi;
uniform mat4 uP;

out vec3 bary;
out vec2 uv;

void main()
{
    vec3 local = vec3(gl_VertexID % 2, gl_VertexID / 2, 0.5) - 0.5;
    vec3 global = vec3(gl_InstanceID % 10, gl_InstanceID / 10, 4.5) - 4.5;

    uv = (local + global).xy * vec2(0.1, 0.1 * 16 / 9) + 0.5;

    bary = vec3(0);
    bary[gl_VertexID % 3] = 1.0;

    gl_Position = uP * vec4(mat3(uVi) * ((local + global - uV[3].xyz) * vec3(1,1,-1)), 1);
}

This was surprisingly easy to implement. In the tool I scan a template definition XML to figure out which shader source files to stitch together and treat as 1 fragment shader. Adding the distinction between .frag and .vert files allowed me to compile the resulting program with a different vertex shader than the default one and it was up and running quite fast.

Next came a more interesting bit, mixing my raymarching things together with this polygonal grid.
There are 2 bits to this, one is matching the projection, two is depth testing, and thus matching the depth output from the raymarcher.

To project a vertex I subtract the ray origin from the vertex and then multiply it by the inverse rotation. Apparantly that flips the Z axis so I had to fix that too. Then I multiply that with the projection matrix. the “u” prefix means uniform variable.

vec4 viewCoord = vec4(uViewInverse * ((vertex - uRayOrigin) * vec3(1,1-1)), 1)

My ray direction is based on mixing the corners of a frustum these days, I used to rotate the ray to get a fisheye effect but that doesn’t fly with regular projection matrices. My frustum calculation looks something like this (before going into the shader as a mat4):

tanFov = tan(uniforms.get('uFovBias', 0.5))
horizontalFov = (tanFov * aspectRatio)
uniforms['uFrustum'] = (-horizontalFov, -tanFov, 1.0, 0.0,
                        horizontalFov, -tanFov, 1.0, 0.0,
                        -horizontalFov, tanFov, 1.0, 0.0,
                        horizontalFov, tanFov, 1.0, 0.0)

So I can get a projection matrix from that as well. Additionally I added a uniform for the clipRange so the raymarcher near/far planes match the polygonal ones.

uniforms['uClipRange'] = (0.01, 100.0)
near, far = uniforms['uClipRange']
projection = Mat44.frustum(-xfov * near, xfov * near, -tfov * near, tfov * near, near, far)

For reference my raymarching ray looks like this:

vec4 d = mix(mix(uFrustum[0], uFrustum[1], uv.x), mix(uFrustum[2], uFrustum[3],uv.x), uv.y);
Ray ray = Ray(uV[3].xyz, normalize(d.xyz * mat3(uV)));

With this raymarching a 10x10x0.01 box matches up perfectly with the polygonal plane on top! Then the next issue is depth testing. All my render targets are now equipped with a float32 depth buffer, depth testing is enabled and before every frame I clear all depth buffers. Now I find my grid on top of my test scene because the raymarcher does not yet write the depth.

Following this nice article I learned a laot about this topic.
So to get the distance along Z I first define the world-space view axis (0,0,-1). Dotting that with the (intersection – rayOrigin), which is the same as totalDistance * raydirection, yield the right eye-space Z distance. The rest is explained in the article. It is pretty straight forward to map the Z using the clipping planes previously defined to match gl_DepthRange. I first fit between a 01 range (ndcDepth) and then fit back to gl_depthRange. One final trick is to fade to the FAR depth if we have 100% fog.

    vec3 viewForward = vec3(0.0, 0.0, -1.0) * mat3(uV);
    float eyeHitZ = hit.totalDistance * dot(ray.direction, viewForward);
    float ndcDepth = ((uClipRange.y + uClipRange.x) + (2 * uClipRange.y * uClipRange.x) / eyeHitZ) / (uClipRange.y - uClipRange.x);
    float z = ((gl_DepthRange.diff * ndcDepth) + gl_DepthRange.near + gl_DepthRange.far) / 2.0;
    gl_FragDepth = mix(z, gl_DepthRange.far, step(0.999, outColor0.w));

Now as if that wasn’t enough cool stuff, I added the option to bind an image file to a shot. Whenever a shot gets drawn it’s texture list is queried, uploaded and bound to the user defined uniform names. Uploading is cached so every texture is loaded only once, I should probably add file watchers… The cool thing here is that not only can I now texture things, I can also enter storyboards and time them before working on actual 3D scenes!

Improving a renderer

This feeds into my previous write up on the tools developed for our 64kb endeavours.

After creating Eidolon [Video] we were left with the feeling that the rendering can be a lot better. We had this single pass bloom and simple lambert & phong shading, no anti aliasing and very poor performing depth of field. Last the performance hit for reflections was through the roof as well.

I started almost immediately with a bunch of improvements, most of this work was done within a month after Revision. Which shows in our newest demo Yermom [Video]. I’ll go over the improvements in chronological order and credit any sources used (of which there were a lot), if I managed to document that right…

Something useful to mention, all my buffers are Float32 RGBA.

Low-resolution reflections:

Basically the scene is raymarched, for every pixel there is a TraceAndShade call to render the pixel excluding fog and reflection.
From the result we do another TraceAndShade for the reflection. This makes the entire thing twice as slow when reflections are on.
Instead I early out at this point if:
if(reflectivity == 0 || gl_FragCoord.x % 4 != 0 || gl_FragCoord.y % 4 != 0) return;
That results in only 1 in 16 pixels being reflective. So instead of compositing the reflection directly I write it to a separate buffer.
Then in a future pass I composite the 2 buffers, where I just do a look up in the reflection buffer like so:
texelFetch(uImages[0], ivec2(gl_FragCoord.xy)) + texelFetch(uImages[1], ivec2(gl_FragCoord.xy / 4) * 4)
In my real scenario I removed that * 4 and render to a 4 times smaller buffer instead, so reading it back results in free interpolation.
I still have glitches when blurring the reflections too much & around edges in general. Definitely still room for future improvement.

Oren Nayar diffuse light response

The original paper and this image especially convinced me into liking this shading model for diffuse objects.

So I tried to implement that, failed a few times, got pretty close, found an accurate implementation, realized it was slow, and ended on these 2 websites:
http://www.popekim.com/2011/11/optimized-oren-nayar-approximation.html
http://www.artisticexperiments.com/cg-shaders/cg-shaders-oren-nayar-fast

That lists a nifty trick to fake it, I took away some terms as I realized they contributed barely any visible difference, so I got something even less accurate. I already want to revisit this, but it’s one of the improvements I wanted to share nonetheless.

float orenNayarDiffuse(float satNdotV, float satNdotL, float roughness)
{
    float lambert = satNdotL;
    if(roughness == 0.0)
        return lambert;
    float softRim = saturate(1.0 - satNdotV * 0.5);

    // my magic numbers
    float fakey = pow(lambert * softRim, 0.85);
    return mix(lambert, fakey * 0.85, roughness);
}

GGX specular

There are various open source implementations of this. I found one here:
http://filmicworlds.com/blog/optimizing-ggx-shaders-with-dotlh/
It talks about tricks to optimize things by precomputing a lookup texture, I didn’t go that far. There’s not much I can say about this, as I don’t fully understand the math and how it changes from the basic phong dot(N, H).

float G1V(float dotNV, float k){return 1.0 / (dotNV * (1.0 - k)+k);}

float ggxSpecular(float NdotV, float NdotL, vec3 N, vec3 L, vec3 V, float roughness)
{
    float F0 = 0.5;

    vec3 H = normalize(V + L);
    float NdotH = saturate(dot(N, H));
    float LdotH = saturate(dot(L, H));
    float a2 = roughness * roughness;

    float D = a2 / (PI * sqr(sqr(NdotH) * (a2 - 1.0) + 1.0));
    float F = F0 + (1.0 - F0) * pow(1.0 - LdotH, 5.0);
    float vis = G1V(NdotL, a2 * 0.5) * G1V(NdotV, a2 * 0.5);
    return NdotL * D * F * vis;
}

FXAA

FXAA3 to be precise. There whitepaper is quite clear, still why bother writing it if it’s open source. I can’t remember which one I used, but here’s a few links:
https://gist.github.com/kosua20/0c506b81b3812ac900048059d2383126
https://github.com/urho3d/Urho3D/blob/master/bin/CoreData/Shaders/GLSL/FXAA3.glsl
https://github.com/vispy/experimental/blob/master/fsaa/fxaa.glsl
Preprocessed and minified for preset 12 made it very small in a compressed executable. Figured I’d just share it.

#version 420
uniform vec3 uTimeResolution;uniform sampler2D uImages[1];out vec4 z;float aa(vec3 a){vec3 b=vec3(.299,.587,.114);return dot(a,b);}
#define bb(a)texture(uImages[0],a)
#define cc(a)aa(texture(uImages[0],a).rgb)
#define dd(a,b)aa(texture(uImages[0],a+(b*c)).rgb)
void main(){vec2 a=gl_FragCoord.xy/uTimeResolution.yz,c=1/uTimeResolution.yz;vec4 b=bb(a);b.y=aa(b.rgb);float d=dd(a,vec2(0,1)),e=dd(a,vec2(1,0)),f=dd(a,vec2(0,-1)),g=dd(a,vec2(-1,0)),h=max(max(f,g),max(e,max(d,b.y))),i=h-min(min(f,g),min(e,min(d,b.y)));if(i<max(.0833,h*.166)){z=bb(a);return;}h=dd(a,vec2(-1,-1));float j=dd(a,vec2( 1,1)),k=dd(a,vec2( 1,-1)),l=dd(a,vec2(-1,1)),m=f+d,n=g+e,o=k+j,p=h+l,q=c.x;
bool r=abs((-2*g)+p)+(abs((-2*b.y)+m)*2)+abs((-2*e)+o)>=abs((-2*d)+l+j)+(abs((-2*b.y)+n)*2)+abs((-2*f)+h+k);if(!r){f=g;d=e;}else q=c.y;h=f-b.y,e=d-b.y,f=f+b.y,d=d+b.y,g=max(abs(h),abs(e));i=clamp((abs((((m+n)*2+p+o)*(1./12))-b.y)/i),0,1);if(abs(e)<abs(h))q=-q;else f=d;vec2 s=a,t=vec2(!r?0:c.x,r?0:c.y);if(!r)s.x+=q*.5;else s.y+=q*.5;
vec2 u=vec2(s.x-t.x,s.y-t.y);s=vec2(s.x+t.x,s.y+t.y);j=((-2)*i)+3;d=cc(u);e=i*i;h=cc(s);g*=.25;i=b.y-f*.5;j=j*e;d-=f*.5;h-=f*.5;bool v,w,x,y=i<0;
#define ee(Q) v=abs(d)>=g;w=abs(h)>=g;if(!v)u.x-=t.x*Q;if(!v)u.y-=t.y*Q;x=(!v)||(!w);if(!w)s.x+=t.x*Q;if(!w)s.y+=t.y*Q;
#define ff if(!v)d=cc(u.xy);if(!w)h=cc(s.xy);if(!v)d=d-f*.5;if(!w)h=h-f*.5;
ee(1.5)if(x){ff ee(2.)if(x){ff ee(4.)if(x){ff ee(12.)}}}e=a.x-u.x;f=s.x-a.x;if(!r){e=a.y-u.y;f=s.y-a.y;}q*=max((e<f?(d<0)!=y:(h<0)!=y)?(min(e,f)*(-1/(f+e)))+.5:0,j*j*.75);if(!r)a.x+=q;else a.y+=q;z=bb(a);}

Multi pass bloom

The idea for this one was heavily inspired by this asset for Unity:

https://www.assetstore.unity3d.com/en/#!/content/17324

I’m quite sure the technique is not original, but that’s where I got the idea.

The idea is to downsample and blur at many resolutions and them combine the (weighted) results to get a very high quality full screen blur.
So basically downsample to a quarter (factor 2) of the screen using this shader:

#version 420

uniform vec3 uTimeResolution;
#define uTime (uTimeResolution.x)
#define uResolution (uTimeResolution.yz)

uniform sampler2D uImages[1];

out vec4 outColor0;

void main()
{
    outColor0 = 0.25 * (texture(uImages[0], (gl_FragCoord.xy + vec2(-0.5)) / uResolution)
    + texture(uImages[0], (gl_FragCoord.xy + vec2(0.5, -0.5)) / uResolution)
    + texture(uImages[0], (gl_FragCoord.xy + vec2(0.5, 0.5)) / uResolution)
    + texture(uImages[0], (gl_FragCoord.xy + vec2(-0.5, 0.5)) / uResolution));
}

Then downsample that, and recurse until we have a factor 64

All the downsamples fit in the backbuffer, so in theory that together with the first blur pass can be done in 1 go using the backbuffer as sampler2D as well. But to avoid the hassle of figuring out the correct (clamped!) uv coordinates I just use a ton of passes.

Then take all these downsampled buffers and ping pong them for blur passes, so for each buffer:
HBLUR taking steps of 2 pixels, into a buffer of the same size
VBLUR, back into the initial downsampled buffer
HBLUR taking steps of 3 pixels, reuse the HBLUR buffer
VBLUR, reuse the initial downsampled buffer

The pixel steps is given to uBlurSize, the direction of blur is given to uDirection.

#version 420

out vec4 color;

uniform vec3 uTimeResolution;
#define uTime (uTimeResolution.x)
#define uResolution (uTimeResolution.yz)

uniform sampler2D uImages[1];
uniform vec2 uDirection;
uniform float uBlurSize;

const float curve[7] = { 0.0205,
    0.0855,
    0.232,
    0.324,
    0.232,
    0.0855,
    0.0205 };

void main()
{
    vec2 uv = gl_FragCoord.xy / uResolution;
    vec2 netFilterWidth = uDirection / uResolution * uBlurSize;
    vec2 coords = uv - netFilterWidth * 3.0;

    color = vec4(0);
    for( int l = 0; l < 7; l++ )
    {
        vec4 tap = texture(uImages[0], coords);
        color += tap * curve[l];
        coords += netFilterWidth;
    }
}

Last we combine passes with lens dirt. uImages[0] is the original backbuffer, 1-6 is all the downsampled and blurred buffers, 7 is a lens dirt image.
My lens dirt texture is pretty poor, its just a precalced texture with randomly scaled and colored circles and hexagons, sometimes filled and sometimes outlines.
I don’t think I actually ever used the lens dirt or bloom intensity as uniforms.

#version 420

out vec4 color;

uniform vec3 uTimeResolution;
#define uTime (uTimeResolution.x)
#define uResolution (uTimeResolution.yz)

uniform sampler2D uImages[8];
uniform float uBloom = 0.04;
uniform float uLensDirtIntensity = 0.3;

void main()
{
    vec2 coord = gl_FragCoord.xy / uResolution;
    color = texture(uImages[0], coord);

    vec3 b0 = texture(uImages[1], coord).xyz;
    vec3 b1 = texture(uImages[2], coord).xyz * 0.6; // dampen to have less banding in gamma space
    vec3 b2 = texture(uImages[3], coord).xyz * 0.3; // dampen to have less banding in gamma space
    vec3 b3 = texture(uImages[4], coord).xyz;
    vec3 b4 = texture(uImages[5], coord).xyz;
    vec3 b5 = texture(uImages[6], coord).xyz;

    vec3 bloom = b0 * 0.5
        + b1 * 0.6
        + b2 * 0.6
        + b3 * 0.45
        + b4 * 0.35
        + b5 * 0.23;

    bloom /= 2.2;
    color.xyz = mix(color.xyz, bloom.xyz, uBloom);

    vec3 lens = texture(uImages[7], coord).xyz;
    vec3 lensBloom = b0 + b1 * 0.8 + b2 * 0.6 + b3 * 0.45 + b4 * 0.35 + b5 * 0.23;
    lensBloom /= 3.2;
    color.xyz = mix(color.xyz, lensBloom, (clamp(lens * uLensDirtIntensity, 0.0, 1.0)));
    
    color.xyz = pow(color.xyz, vec3(1.0 / 2.2));
}

White lines on a cube, brightness of 10.

White lines on a cube, brightness of 300.

Sphere tracing algorithm

Instead of a rather naive sphere tracing loop I used in a lot of 4kb productions and can just write by heart I went for this paper:
http://erleuchtet.org/~cupe/permanent/enhanced_sphere_tracing.pdf
It is a clever technique that involves overstepping and backgracking only when necessary, as well as keeping track of pixel size in 3D to realize when there is no need to compute more detail. The paper is full of code snippets and clear infographics, I don’t think I’d be capable to explain it any clearer.

Beauty shots

Depth of field

I initially only knew how to do good circular DoF, until this one came along: https://www.shadertoy.com/view/4tK3WK
Which I used initially, but to get it to look good was really expensive, because it is all single pass. Then I looked into a 3-blur-pass solution, which sorta worked, but when I went looking for more optimized versions I found this 2 pass one: https://www.shadertoy.com/view/Xd3GDl. It works extremely well, the only edge cases I found were when unfocusing a regular grid of bright points.

Here’s what I wrote to get it to work with a depth buffer (depth based blur):

const int NUM_SAMPLES = 16;

void main()
{
    vec2 fragCoord = gl_FragCoord.xy;

    const vec2 blurdir = vec2( 0.0, 1.0 );
    vec2 blurvec = (blurdir) / uResolution;
    vec2 uv = fragCoord / uResolution.xy;

    float z = texture(uImages[0], uv).w;
    fragColor = vec4(depthDirectionalBlur(z, CoC(z), uv, blurvec, NUM_SAMPLES), z);
}

Second pass:

const int NUM_SAMPLES = 16;

void main()
{
    vec2 uv = gl_FragCoord.xy / uResolution;

    float z = texture(uImages[0], uv).w;

    vec2 blurdir = vec2(1.0, 0.577350269189626);
    vec2 blurvec = normalize(blurdir) / uResolution;
    vec3 color0 = depthDirectionalBlur(z, CoC(z), uv, blurvec, NUM_SAMPLES);

    blurdir = vec2(-1.0, 0.577350269189626);
    blurvec = normalize(blurdir) / uResolution;
    vec3 color1 = depthDirectionalBlur(z, CoC(z), uv, blurvec, NUM_SAMPLES);

    vec3 color = min(color0, color1);
    fragColor = vec4(color, 1.0);
}

Shared header:

#version 420

// default uniforms
uniform vec3 uTimeResolution;
#define uTime (uTimeResolution.x)
#define uResolution (uTimeResolution.yz)

uniform sampler2D uImages[1];

uniform float uSharpDist = 15; // distance from camera that is 100% sharp
uniform float uSharpRange = 0; // distance from the sharp center that remains sharp
uniform float uBlurFalloff = 1000; // distance from the edge of the sharp range it takes to become 100% blurry
uniform float uMaxBlur = 16; // radius of the blur in pixels at 100% blur

float CoC(float z)
{
    return uMaxBlur * min(1, max(0, abs(z - uSharpDist) - uSharpRange) / uBlurFalloff);
}

out vec4 fragColor;

//note: uniform pdf rand [0;1)
float hash1(vec2 p)
{
    p = fract(p * vec2(5.3987, 5.4421));
    p += dot(p.yx, p.xy + vec2(21.5351, 14.3137));
    return fract(p.x * p.y * 95.4307);
}

#define USE_RANDOM

vec3 depthDirectionalBlur(float z, float coc, vec2 uv, vec2 blurvec, int numSamples)
{
    // z: z at UV
    // coc: blur radius at UV
    // uv: initial coordinate
    // blurvec: smudge direction
    // numSamples: blur taps
    vec3 sumcol = vec3(0.0);

    for (int i = 0; i < numSamples; ++i)
    {
        float r =
            #ifdef USE_RANDOM
            (i + hash1(uv + float(i + uTime)) - 0.5)
            #else
            i
            #endif
            / float(numSamples - 1) - 0.5;
        vec2 p = uv + r * coc * blurvec;
        vec4 smpl = texture(uImages[0], p);
        if(smpl.w < z) // if sample is closer consider it's CoC
        {
            p = uv + r * min(coc, CoC(smpl.w)) * blurvec;
            p = uv + r * CoC(smpl.w) * blurvec;
            smpl = texture(uImages[0], p);
        }
        sumcol += smpl.xyz;
    }

    sumcol /= float(numSamples);
    sumcol = max(sumcol, 0.0);

    return sumcol;
}

Additional sources used for a longer time

Distance function library

http://mercury.sexy/hg_sdf/
A very cool site explaining all kinds of things you can do with this code. I think many of these functions were invented already, but with some bonusses as ewll as a very clear code style and excellent documentations for full accessibility.
For an introduction to this library:
https://www.youtube.com/watch?v=T-9R0zAwL7s

Noise functions

https://www.shadertoy.com/view/4djSRW
Hashes optimized to only implement hash4() and the rest is just swizzling and redirecting, so a float based hash is just:

float hash1(float x){return hash4(vec4(x)).x;}
vec2 hash2(float x){return hash4(vec4(x)).xy;}

And so on.

Value noise
https://www.shadertoy.com/view/4sfGzS
https://www.shadertoy.com/view/lsf3WH

Voronoi 2D
https://www.shadertoy.com/view/llG3zy
Voronoi is great, as using the center distance we get worley noise instead, and we can track cell indices for randomization.
This is fairly fast, but still too slow to do realtime. So I implemented tileable 2D & 3D versions.

Perlin
Layering the value noise for N iterations, scaling the UV by 2 and weight by 0.5 in every iteration.
These could be controllable parameters for various different looks. A slower weight decrease results in a more wood-grain look for example.

float perlin(vec2 p, int iterations)
{
    float f = 0.0;
    float amplitude = 1.0;

    for (int i = 0; i < iterations; ++i)
    {
        f += snoise(p) * amplitude;
        amplitude *= 0.5;
        p *= 2.0;
    }

    return f * 0.5;
}

Now the perlin logic can be applied to worley noise (voronoi center) to get billows. I did the same for the voronoi edges, all tileable in 2D and 3D for texture precalc. Here’s an example. Basically the modulo in the snoise function is the only thing necessary to make things tileable. Perlin then just uses that and keeps track of the scale for that layer.

float snoise_tiled(vec2 p, float scale)
{
    p *= scale;
    vec2 c = floor(p);
    vec2 f = p - c;
    f = f * f * (3.0 - 2.0 * f);
    return mix(mix(hash1(mod(c + vec2(0.0, 0.0), scale) + 10.0),
    hash1(mod(c + vec2(1.0, 0.0), scale) + 10.0), f.x),
    mix(hash1(mod(c + vec2(0.0, 1.0), scale) + 10.0),
    hash1(mod(c + vec2(1.0, 1.0), scale) + 10.0), f.x), f.y);
}
float perlin_tiled(vec2 p, float scale, int iterations)
{
    float f = 0.0;
    p = mod(p, scale);
    float amplitude = 1.0;
    
    for (int i = 0; i < iterations; ++i)
    {
        f += snoise_tiled(p, scale) * amplitude;
        amplitude *= 0.5;
        scale *= 2.0;
    }

    return f * 0.5;
}

Part 3: Importing and drawing a custom mesh file

Part 3: Creating an importer

This is part 3 of a series and it is about getting started with visualizing triangle meshes with Python 2.7 using the libraries PyOpenGL and PyQt4.

Part 1
Part 2
Part 3

I will assume you know python, you will not need a lot of Qt or OpenGL experience, though I will also not go into the deeper details of how OpenGL works. For that I refer you to official documentation and the excellent (C++) tutorials at https://open.gl/. Although they are C++, there is a lot of explanation about OpenGL and why to do certain calls in a certain order.

On a final note: I will make generalizations and simplifications when explaining things. If you think something works different then I say it probably does, this is to try and convey ideas to beginners, not to explain low level openGL implementations.

3.1 Importing

Now with our file format resembling openGL so closely makes this step relatively easy. First I’ll declare some globals, because openGL does not have real enums but just a bunch of global constants I make some groups to do testing and data mapping against.

from OpenGL.GL import *

attributeElementTypes = (GL_BYTE,
                        GL_UNSIGNED_BYTE,
                        GL_SHORT,
                        GL_UNSIGNED_SHORT,
                        GL_INT,
                        GL_UNSIGNED_INT,
                        GL_HALF_FLOAT,
                        GL_FLOAT,
                        GL_DOUBLE,
                        GL_FIXED,
                        GL_INT_2_10_10_10_REV,
                        GL_UNSIGNED_INT_2_10_10_10_REV,
                        GL_UNSIGNED_INT_10F_11F_11F_REV)
sizeOfType = {GL_BYTE: 1,
             GL_UNSIGNED_BYTE: 1,
             GL_SHORT: 2,
             GL_UNSIGNED_SHORT: 2,
             GL_INT: 4,
             GL_UNSIGNED_INT: 4,
             GL_HALF_FLOAT: 2,
             GL_FLOAT: 4,
             GL_DOUBLE: 8,
             GL_FIXED: 4,
             GL_INT_2_10_10_10_REV: 4,
             GL_UNSIGNED_INT_2_10_10_10_REV: 4,
             GL_UNSIGNED_INT_10F_11F_11F_REV: 4}
drawModes = (GL_POINTS,
            GL_LINE_STRIP,
            GL_LINE_LOOP,
            GL_LINES,
            GL_LINE_STRIP_ADJACENCY,
            GL_LINES_ADJACENCY,
            GL_TRIANGLE_STRIP,
            GL_TRIANGLE_FAN,
            GL_TRIANGLES,
            GL_TRIANGLE_STRIP_ADJACENCY,
            GL_TRIANGLES_ADJACENCY,
            GL_PATCHES)
indexTypeFromSize = {1: GL_UNSIGNED_BYTE, 2: GL_UNSIGNED_SHORT, 4: GL_UNSIGNED_INT}

Next up is a Mesh class that stores a vertex array object (and corresponding buffers for deletion) along with all info necessary to draw the mesh once it’s on the GPU.

class Mesh(object):
    def __init__(self, vao, bufs, drawMode, indexCount, indexType):
        self.__vao = vao
        self.__bufs = bufs
        self.__drawMode = drawMode
        self.__indexCount = indexCount
        self.__indexType = indexType
 
    def __del__(self):
        glDeleteBuffers(len(self.__bufs), self.__bufs)
        glDeleteVertexArrays(1, [self.__vao])
 
    def draw(self):
        glBindVertexArray(self.__vao)
        glDrawElements(self.__drawMode, self.__indexCount, self.__indexType, None)

Now let’s, given a file path, open up the file and run the importer for the right version (if known).

def model(filePath):
    vao = glGenVertexArrays(1)
    glBindVertexArray(vao)
    bufs = glGenBuffers(2)
    glBindBuffer(GL_ARRAY_BUFFER, bufs[0])
    glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, bufs[1])
    with open(filePath, 'rb') as fh:
        fileVersion = struct.unpack('B', fh.read(1))[0]
        if fileVersion == 0:
            return _loadMesh_v0(fh, vao, bufs)
        raise RuntimeError('Unknown mesh file version %s in %s' % (fileVersion, filePath))

Next we can start reading the rest of the file:

    vertexCount = struct.unpack('I', fh.read(4))[0]
    vertexSize = struct.unpack('B', fh.read(1))[0]
    indexCount = struct.unpack('I', fh.read(4))[0]
    indexSize = struct.unpack('B', fh.read(1))[0]
    assert indexSize in indexTypeFromSize, 'Unknown element data type, element size must be one of %s' % indexTypeFromSize.keys()
    indexType = indexTypeFromSize[indexSize]
    drawMode = struct.unpack('I', fh.read(4))[0]
    assert drawMode in (GL_LINES, GL_TRIANGLES), 'Unknown draw mode.'  # TODO: list all render types

Read and apply the attribute layout:

# gather layout
numAttributes = struct.unpack('B', fh.read(1))[0]
offset = 0
layouts = []
for i in xrange(numAttributes):
   location = struct.unpack('B', fh.read(1))[0]
   dimensions = struct.unpack('B', fh.read(1))[0]
   assert dimensions in (1, 2, 3, 4)
   dataType = struct.unpack('I', fh.read(4))[0]
   assert dataType in attributeElementTypes, 'Invalid GLenum value for attribute element type.'
   layouts.append((location, dimensions, dataType, offset))
   offset += dimensions * sizeOfType[dataType]
# apply
for layout in layouts:
   glVertexAttribPointer(layout[0], layout[1], layout[2], GL_FALSE, offset, ctypes.c_void_p(layout[3]))  # total offset is now stride
   glEnableVertexAttribArray(layout[0])

Read and upload the raw buffer data. This step is easy because we can directly copy the bytes as the storage matches exactly with how openGL expects it due to the layout code above.

raw = fh.read(vertexSize * vertexCount)
glBufferData(GL_ARRAY_BUFFER, vertexSize * vertexCount, raw, GL_STATIC_DRAW)
raw = fh.read(indexSize * indexCount)
glBufferData(GL_ELEMENT_ARRAY_BUFFER, indexSize * indexCount, raw, GL_STATIC_DRAW)

3.2 The final code

This is the application code including all the rendering from chapter 1, only the rectangle has been replaced by the loaded mesh.

# the importer
import struct
from OpenGL.GL import *

attributeElementTypes = (GL_BYTE,
                        GL_UNSIGNED_BYTE,
                        GL_SHORT,
                        GL_UNSIGNED_SHORT,
                        GL_INT,
                        GL_UNSIGNED_INT,
                        GL_HALF_FLOAT,
                        GL_FLOAT,
                        GL_DOUBLE,
                        GL_FIXED,
                        GL_INT_2_10_10_10_REV,
                        GL_UNSIGNED_INT_2_10_10_10_REV,
                        GL_UNSIGNED_INT_10F_11F_11F_REV)
sizeOfType = {GL_BYTE: 1,
             GL_UNSIGNED_BYTE: 1,
             GL_SHORT: 2,
             GL_UNSIGNED_SHORT: 2,
             GL_INT: 4,
             GL_UNSIGNED_INT: 4,
             GL_HALF_FLOAT: 2,
             GL_FLOAT: 4,
             GL_DOUBLE: 8,
             GL_FIXED: 4,
             GL_INT_2_10_10_10_REV: 4,
             GL_UNSIGNED_INT_2_10_10_10_REV: 4,
             GL_UNSIGNED_INT_10F_11F_11F_REV: 4}
drawModes = (GL_POINTS,
            GL_LINE_STRIP,
            GL_LINE_LOOP,
            GL_LINES,
            GL_LINE_STRIP_ADJACENCY,
            GL_LINES_ADJACENCY,
            GL_TRIANGLE_STRIP,
            GL_TRIANGLE_FAN,
            GL_TRIANGLES,
            GL_TRIANGLE_STRIP_ADJACENCY,
            GL_TRIANGLES_ADJACENCY,
            GL_PATCHES)
indexTypeFromSize = {1: GL_UNSIGNED_BYTE, 2: GL_UNSIGNED_SHORT, 4: GL_UNSIGNED_INT}


def _loadMesh_v0(fh, vao, bufs):
    vertexCount = struct.unpack('I', fh.read(4))[0]
    vertexSize = struct.unpack('B', fh.read(1))[0]
    indexCount = struct.unpack('I', fh.read(4))[0]
    indexSize = struct.unpack('B', fh.read(1))[0]
    assert indexSize in indexTypeFromSize, 'Unknown element data type, element size must be one of %s' % indexTypeFromSize.keys()
    indexType = indexTypeFromSize[indexSize]
    drawMode = struct.unpack('I', fh.read(4))[0]
    assert drawMode in (GL_LINES, GL_TRIANGLES), 'Unknown draw mode.'  # TODO: list all render types
  
    # gather layout
    numAttributes = struct.unpack('B', fh.read(1))[0]
    offset = 0
    layouts = []
    for i in xrange(numAttributes):
        location = struct.unpack('B', fh.read(1))[0]
        dimensions = struct.unpack('B', fh.read(1))[0]
        assert dimensions in (1, 2, 3, 4)
        dataType = struct.unpack('I', fh.read(4))[0]
        assert dataType in attributeElementTypes, 'Invalid GLenum value for attribute element type.'
        layouts.append((location, dimensions, dataType, offset))
        offset += dimensions * sizeOfType[dataType]
  
    # apply layout
    for layout in layouts:
        glVertexAttribPointer(layout[0], layout[1], layout[2], GL_FALSE, offset, ctypes.c_void_p(layout[3]))  # total offset is now stride
        glEnableVertexAttribArray(layout[0])
  
    raw = fh.read(vertexSize * vertexCount)
    glBufferData(GL_ARRAY_BUFFER, vertexSize * vertexCount, raw, GL_STATIC_DRAW)
    raw = fh.read(indexSize * indexCount)
    glBufferData(GL_ELEMENT_ARRAY_BUFFER, indexSize * indexCount, raw, GL_STATIC_DRAW)
  
    assert len(fh.read()) == 0, 'Expected end of file, but file is longer than it indicates'
    return Mesh(vao, bufs, drawMode, indexCount, indexType)


class Mesh(object):
    def __init__(self, vao, bufs, drawMode, indexCount, indexType):
        self.__vao = vao
        self.__bufs = bufs
        self.__drawMode = drawMode
        self.__indexCount = indexCount
        self.__indexType = indexType
  
    def __del__(self):
        glDeleteBuffers(len(self.__bufs), self.__bufs)
        glDeleteVertexArrays(1, [self.__vao])
  
    def draw(self):
        glBindVertexArray(self.__vao)
        glDrawElements(self.__drawMode, self.__indexCount, self.__indexType, None)


def model(filePath):
    vao = glGenVertexArrays(1)
    glBindVertexArray(vao)
    bufs = glGenBuffers(2)
    glBindBuffer(GL_ARRAY_BUFFER, bufs[0])
    glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, bufs[1])
    with open(filePath, 'rb') as fh:
        fileVersion = struct.unpack('B', fh.read(1))[0]
        if fileVersion == 0:
            return _loadMesh_v0(fh, vao, bufs)
        raise RuntimeError('Unknown mesh file version %s in %s' % (fileVersion, filePath))


# import the necessary modules
import time
from PyQt4.QtCore import *  # QTimer
from PyQt4.QtGui import *  # QApplication
from PyQt4.QtOpenGL import *  # QGLWidget
from OpenGL.GL import *  # OpenGL functionality


# this is the basic window
class OpenGLView(QGLWidget):
    def initializeGL(self):
        # set the RGBA values of the background
        glClearColor(0.1, 0.2, 0.3, 1.0)
  
        # set a timer to redraw every 1/60th of a second
        self.__timer = QTimer()
        self.__timer.timeout.connect(self.repaint)
        self.__timer.start(1000 / 60)
  
        # import a model
        self.__mesh = model(r'C:\Users\John\Python\maya\cube.bm')
  
    def resizeGL(self, width, height):
        glViewport(0, 0, width, height)
  
    def paintGL(self):
        glLoadIdentity()
        glScalef(self.height() / float(self.width()), 1.0, 1.0)
        glRotate((time.time() % 36.0) * 10, 0, 0, 1)
  
        glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT)
        self.__mesh.draw()


# this initializes Qt
app = QApplication([])
# this creates the openGL window, but it isn't initialized yet
window = OpenGLView()
# this only schedules the window to be shown on the next Qt update
window.show()
# this starts the Qt main update loop, it avoids python from continuing beyond this line
# and any Qt stuff we did above is now going to actually get executed, along with any future
# events like mouse clicks and window resizes
app.exec_()

Part 1: Drawing with PyOpenGL using moden openGL buffers.

This is part 1 of a series and it is about getting started with visualizing triangle meshes with Python 2.7 using the libraries PyOpenGL and PyQt41.

Part 1
Part 2
Part 3

I will assume you know python, you will not need a lot of Qt or OpenGL experience, though I will also not go into the deeper details of how OpenGL works. For that I refer you to official documentation and the excellent (C++) tutorials at https://open.gl/. Although they are C++, there is a lot of explanation about OpenGL and why to do certain calls in a certain order.

On a final note: I will make generalizations and simplifications when explaining things. If you think something works different then I say it probably does, this is to try and convey ideas to beginners, not to explain low level openGL implementations.

Part 1: Drawing a mesh using buffers.

1.1. Setting up

Download & run Python with default settings:
https://www.python.org/ftp/python/2.7.12/python-2.7.12.amd64.msi

Download & run PyQt4 with default settings:
https://sourceforge.net/projects/pyqt/files/PyQt4/PyQt-4.11.4/PyQt4-4.11.4-gpl-Py2.7-Qt4.8.7-x64.exe/download

Paste the following in a windows command window (windows key + R -> type “cmd.exe” -> hit enter):
C:/Python27/Scripts/pip install setuptools
C:/Python27/Scripts/pip install PyOpenGL

1.2. Creating an OpenGL enabled window in Qt.

The first thing to know about OpenGL is that any operation requires OpenGL to be initialized. OpenGL is not something you just “import”, it has to be attached to a (possibly hidden) window. This means that any file loading or global initialization has to be postponed until OpenGL is available.

The second thing to know about OpenGL is that it is a big state machine. Any setting you change is left until you manually set it back. This means in Python we may want to create some contexts (using contextlib) to manage the safe setting and unsetting of certain states. I will however not go this far.

Similar to this Qt also requires prior initialization. So here’s some relevant code:

# import the necessary modules
from PyQt4.QtCore import * # QTimer
from PyQt4.QtGui import * # QApplication
from PyQt4.QtOpenGL import * # QGLWidget
from OpenGL.GL import * # OpenGL functionality
from OpenGL.GL import shaders # Utilities to compile shaders, we may not actually use this

# this is the basic window
class OpenGLView(QGLWidget):
def initializeGL(self):
# here openGL is initialized and we can do our real program initialization
pass

def resizeGL(self, width, height):
# openGL remembers how many pixels it should draw,
# so every resize we have to tell it what the new window size is it is supposed
# to be drawing for
pass

def paintGL(self):
# here we can start drawing, on show and on resize the window will redraw
# automatically
pass

# this initializes Qt
app = QApplication([])
# this creates the openGL window, but it isn't initialized yet
window = OpenGLView()
# this only schedules the window to be shown on the next Qt update
window.show()
# this starts the Qt main update loop, it avoids python from continuing beyond this
# line and any Qt stuff we did above is now going to actually get executed, along with
# any future events like mouse clicks and window resizes
app.exec_()

Running this should get you a black window that is OpenGL enabled. So let’s fill in the view class to draw something in real-time. This will show you how to make your window update at 60-fps-ish, how to set a background color and how to handle resizes.

class OpenGLView(QGLWidget):
    def initializeGL(self):
        # set the RGBA values of the background
        glClearColor(0.1, 0.2, 0.3, 1.0)
        # set a timer to redraw every 1/60th of a second
        self.__timer = QTimer()
        self.__timer.timeout.connect(self.repaint) # make it repaint when triggered
        self.__timer.start(1000 / 60) # make it trigger every 1000/60 milliseconds
   
    def resizeGL(self, width, height):
        # this tells openGL how many pixels it should be drawing into
        glViewport(0, 0, width, height)
   
    def paintGL(self):
        # empty the screen, setting only the background color
        # the depth_buffer_bit also clears the Z-buffer, which is used to make sure
        # objects that are behind other objects actually are not shown drawing 
        # a faraway object later than a nearby object naively implies that it will 
        # just fill in the pixels with itself, but if there is already an object there 
        # the depth buffer will handle checking if it is closer or not automatically
        glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT)
        # the openGL window has coordinates from (-1,-1) to (1,1), so this fills in 
        # the top right corner with a rectangle. The default color is white.
        glRecti(0, 0, 1, 1)

Note that the QTimer forces the screen to redraw, but because we are not animating any data this will not be visible right now.

1.3. Creating a Vertex Array Object (VAO)

In OpenGL there is a lot we can put into a mesh, not only positions of vertices, but also triangulation patterns, vertex colors, texture coordinates, normals etcetera. Because OpenGL is a state machine (as described at the start of 2.) this means that when drawing 2 different models a lot of settings need to be swapped before we can draw it. This is why the VAO was created, as it is a way to group settings together and be able to draw a mesh (once set up properly) in only 2 calls. It is not less code, but it allows us to move more code to the initialization stage, winning performance and reducing risk of errors resulting in easier debugging.

Our mesh however will not be very complicated. We require 2 sets of data, the vertex positions and the triangulation (3 integers per triangle pointing to what vertex to use for this triangle).
Untitled
As you can see this would result in the following data:
Positions = [0, 0, 1, 0, 0, 1, 1, 1]
Elements = [0, 1, 2, 1, 3, 2]
4 2D vertices and 2 triangles made of 3 indices each.

So let’s give this data to a VAO at the bottom of initializeGL.

# generate a model
# set up the data
positions = (0, 0, 1, 0, 0, 1, 1, 1)
elements = (0, 1, 2, 1, 3, 2)
# apply the data
# generate a vertex array object so we can easily draw the resulting mesh later
self.__vao = glGenVertexArrays(1)
# enable the vertex array before doing anything else, so anything we do is captured in the VAO context
glBindVertexArray(self.__vao)
# generate 2 buffers, 1 for positions, 1 for elements. this is memory on the GPU that our model will be saved in.
bufs = glGenBuffers(2)
# set the first buffer for the main vertex data, that GL_ARRAY_BUFFER indicates that use case
glBindBuffer(GL_ARRAY_BUFFER, bufs[0])
# upload the position data to the GPU
# some info about the arguments:
# GL_ARRAY_BUFFER: this is the buffer we are uploading into, that is why we first had to bind the created buffer, else we'd be uploading to nothing
# sizeof(ctypes.c_float) * len(positions): openGL wants our data as raw C pointer, and for that it needs to know the size in bytes.
# the ctypes module helps us figure out the size in bytes of a single number, then we just multiply that by the array length
# (ctypes.c_float * len(positions))(*positions): this is a way to convert a python list or tuple to a ctypes array of the right data type
# internally this makes that data the right binary format
# GL_STATIC_DRAW: in OpenGL you can specify what you will be doing with this buffer, static means draw it a lot but never access or alter the data once uploaded.
# I suggest changing this only when hitting performance issues at a time you are doing way more complicated things. In general usage static is the fastest.
glBufferData(GL_ARRAY_BUFFER, sizeof(ctypes.c_float) * len(positions), (ctypes.c_float * len(positions))(*positions), GL_STATIC_DRAW)
# set the second buffer for the triangulation data, GL_ELEMENT_ARRAY_BUFFER indicates the use here
glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, bufs[1])
# upload the triangulation data
glBufferData(GL_ELEMENT_ARRAY_BUFFER, sizeof(ctypes.c_uint) * len(elements), (ctypes.c_uint * len(elements))(*elements), GL_STATIC_DRAW)
# because the data is now on the GPU, our python positions & elements can be safely garbage collected hereafter
# turn on the position attribute so OpenGL starts using our array buffer to read vertex positions from
glEnableVertexAttribArray(0)
# set the dimensions of the position attribute, so it consumes 2 floats at a time (default is 4)
glVertexAttribPointer(0, 2, GL_FLOAT, GL_FALSE, 0, None)

So that was quite some code, and it is quite simple because we only have positions to deal with right now. But first let’s try to draw it!
Replace the glRecti call with:

# enable the vertex array we initialized, it will bind the right buffers in the background again
glBindVertexArray(self.__vao)
# draw triangles based on the active GL_ELEMENT_ARRAY_BUFFER
# that 6 is the element count, we can save the len(elements) in initializeGL in the future
# that None is because openGL allows us to supply an offset for what element to start drawing at
# (we could only draw the second triangle by offsetting by 3 indices for example)
# problem is that the data type for this must be None or ctypes.c_void_p.
# In many C++ example you will see just "0" being passed in
# but in PyOpenGL this doesn't work and will result in nothing being drawn.
glDrawElements(GL_TRIANGLES, 6, GL_UNSIGNED_INT, None)

Now we should have an identical picture. Some more info about glVertexAttribPointer:

In OpenGL we can upload as many buffers as we want, but for now I’ll stick with the 2 we have. This means that if we want to (for example) add colors to our mesh, we have to set up multiple attrib pointers, that both point to different parts of the buffer. I like to keep all my vertex data concatenated, so that we could get (x,y,r,g,b,x,y,r,g,b…) etcetera in our buffer.

Now for OpenGL to render it not only wants to know what buffer to look at (the array_buffer), but it also wants to know how to interpret that data, and what data is provided. OpenGL understand this through attribute locations. Here we activate attribute location 0 (with glEnableVertexAttribArray) and then set our buffer to be 2 floats per vertex at attribute location 0.

The default openGL attribute locations are as follows:
0: position
1: tangent
2: normal
3: color
4: uv

To support multiple attributes in a single buffer we have to use the last 2 arguments of glVertexAttribPointer. The first of those is the size of all data per vertex, so imagine a 2D position and an RGB color that would be 5 * sizeof(float). The second of those is where this attribute location starts. Here’s an example to set up position and color:

vertex_data = (0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1) ##
vertex_element_size = 5 ##
elements = (0, 1, 2, 1, 3, 2)
self.__vao = glGenVertexArrays(1)
glBindVertexArray(self.__vao)
bufs = glGenBuffers(2)
glBindBuffer(GL_ARRAY_BUFFER, bufs[0])
glBufferData(GL_ARRAY_BUFFER, sizeof(ctypes.c_float) * len(vertex_data), (ctypes.c_float * len(vertex_data))(*vertex_data), GL_STATIC_DRAW) ##
glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, bufs[1])
glBufferData(GL_ELEMENT_ARRAY_BUFFER, sizeof(ctypes.c_uint) * len(elements), (ctypes.c_uint * len(elements))(*elements), GL_STATIC_DRAW)
glEnableVertexAttribArray(3) ##
glVertexAttribPointer(3, 3, GL_FLOAT, GL_FALSE, sizeof(ctypes.c_float) * vertex_element_size, ctypes.c_void_p(2 * sizeof(ctypes.c_float))) ##
glEnableVertexAttribArray(0)
glVertexAttribPointer(0, 2, GL_FLOAT, GL_FALSE, sizeof(ctypes.c_float) * vertex_element_size, None) ##

This is an update to initializeGL, new / changed code ends in ## (because I don’t know how to override the syntax highlighting), and your rectangle will immediately start showing colors2!

One last thing. add this to the top of paintGL:

import time
glLoadIdentity()
glScalef(self.height() / float(self.width()), 1.0, 1.0)
glRotate((time.time() % 36.0) * 10, 0, 0, 1)

The first line (after the import) restores the transform state, the second line corrects aspect ratio (so a square is really square now), the last line rotates over time. We are using a startup time because Python’s time is a too large value, by subtracting the application start time from it we get a value OpenGL can actually work with.

That’s it for part 1!

1PyQt5, PySide, PySide2 are (apart from some class renames) also compatible with this.

2The color attribute binding only works on NVidia, there is no official default attribute location and most other drivers will ignore glVertexAttribPointer (or do something random) if you do not use a custom shader. So if you’re not seeing colors, don’t worry and try diving into shaders later!