Python dependency graph

Beware, code heavy post
Node graphs are a great way to express relationships and logical flow between objects, both for code design and 3D scenes. Where a tree can only express single parent-child relationships, a node graph can show a hierarchy and then display how one object has a look-at logic to look at another object entirely.

This can be done for almost any type of data, and a dependency graph is an evaluation model that tries to evaluate as little of the graph as possible. An evaluation model is the way our code traverses the graph’s data to compute the end result.

A dependency graph works by lazily computing data and by storing the “dirty” state of each attribute (dirty meaning something has been changed since the last calculation of this attribute). I’ll try to illustrate with this animation of 2 colors being mixed. When an input changes it propagates that change by “dirtying” all downstream dependencies (red).

When a value is requested, input wires are followed and for outputs the node is computed (yellow), but only if the attribute is “dirty”, else we just use the cached state (green).

So to sum it up in a few simple rules:

To keep it simple I’ll say “if an input on a node changes all its outputs are dirty”, which means nodes should be quite granular as having inputs that only change part of the outputs will waste computation.

Also any connection going out of an attribute (whether it is an input feeding another input directly or an output feeding another input) will dirty all targets. These 2 simple rules result in recursive dirtying of all the right inputs and outputs throughout the graph.

When a value is requested its cached value is returned unless it’s “dirty“ in which case (for outputs) the node will be computed or (for inputs) the other side of the connection will be queried, which results in this rule applying recursively and the minimal graph being computed to get the right value in the end.

Coding it

Now with that theory out of the way, I’ve written some python to do exactly this. First let’s take a look at the initial data structure. A plug must understand connections, ownership, dirty-state and retain an internal value (e.g. the result of computing an output, or just a constant value for an input that is not connected to anything).

class Plug(object):
    def __init__(self, name, node, isComputed):
        self.__name = name  # really used for debugging only
        self.__node = node  # the parent node this plug belongs to
        self.__isComputed = isComputed  # make this an output-plug
        self.__isDirty = isComputed  # upstream dependencies changed? computed plug always have dependencies on creation
        self.__cache = None  # last computed value for outputs, user set value for inputs
        self.__sources = []  # plugs we depend on
        self.__targets = []  # plugs we feed into

    def clean(self):
        self.__isDirty = False

class DependNode(object):
    def __init__(self, computeFunc=None):
        self.__plugs = OrderedDict()  # list of node plugs, dict key should match plug name
        self._computeFunc = computeFunc  # logic implementation, should be a callable that accepts this DependNode instance as only argument

    def compute(self):
        # call compute function if node has logic
        if self._computeFunc is not None:
            self._computeFunc(self)

        # clean all plugs
        for plug in self.__plugs.itervalues():
            plug.clean()

    def addInput(self, name):
        self.__plugs[name] = Plug(name, self, isComputed=False)

    def addOutput(self, name):
        self.__plugs[name] = Plug(name, self, isComputed=True)

This template allows us to set up a basic node with inputs and outputs and to provide a custom function to calculate this node’s outputs.

const = DependNode()
const.addInput('value')

The next step is to allow setting a value to a specific plug. To do this, I’ll use some meta-programming to make the resulting code more readable. Implementing __getattr__ and __setattr__ on the DependNode as well as value() and setValue() on the plug to alter the Plug.__cache:

# Added to the end of the Plug class:
    def value(self):
        return self.__cache

    def setValue(self, value):
        self.__cache = value

# Added to the end of the DependNode class:
    def __getattr__(self, name):
        return self.__plugs[name].value()

    def __setattr__(self, name, value):
        if name not in ('_DependNode__plugs', '_DependNode__computeFunc'): # To enable setting attrs in __init__ we must revert to default behaviour for those atrtibute names, notice how mangled names are required for private attributes.
            return self.__plugs[name].setValue(value)
        return super(DependNode, self).__setattr__(name, value)

Now we can do this and see the internal state reflected properly:

const.value = 2.0

The next step I suppose is to create a simple “add” node which sums up its inputs. To do this we’ll have to add connections. I’ll implement this on the Plug class, where a plugs sources and targets are managed implicitly:

    def addSource(self, sourcePlug):
        if sourcePlug not in self.__sources:
            self.__sources.append(sourcePlug)
            sourcePlug.__targets.append(self)

    def removeSource(self, sourcePlug):
        if sourcePlug in self.__sources:
            self.__sources.remove(sourcePlug)
            sourcePlug.__targets.remove(self)

The next step is to start implementing some of our rules. First rule we need is: compute when we are dirty. This I do in Plug.value().

    def value(self):
        if self.__isDirty:
            # we are going to get clean, set clean beforehand so the compute function can get other plug values without recursively triggering compute again.
            self.__isDirty = False
            # compute dirty output
            if self.__isComputed:
                self.__node.compute()
            # fetch dirty input connection
            elif self.__sources:
                self.__cache = [source.value() for source in self.__sources]
            # plug is clean now
            self.clean()
        # return internal state
        return self.__cache

So now outputs are computed, inputs return their sources, all this is cached on the plug so it only computes when dirty is true.
The next step is to actually set and propagate the dirty state. So whenever we set a value, set a source, or a plug gets dirtied: all outgoing connections are dirtied. When the dirty happens to an input, the node’s outputs are dirtied. See the _dirty implementation below for the Plug class. The setValue and add- removeSource functions just get a _dirty call in the end.

# modifications to Plug
    def setValue(self, value):
        self.__cache = value
        self._dirty()

    def addSource(self, sourcePlug):
        if sourcePlug not in self.__sources:
            self.__sources.append(sourcePlug)
            sourcePlug.__targets.append(self)
            self._dirty()

    def removeSource(self, sourcePlug):
        if sourcePlug in self.__sources:
            self.__sources.remove(sourcePlug)
            sourcePlug.__targets.remove(self)
            self._dirty()

# added to Plug
    def _isComputed(self):
        return self.__isComputed

    def _dirty(self):
        if self.__isDirty:
            return
        self.__isDirty = True

        # dirty plugs that are computed based on this plug
        if not self.__isComputed:
            for plug in self.__node.iterPlugs():
                if plug._isComputed():
                    plug._dirty()

        # dirty plugs that use this plug as source
        for plug in self.__targets:
            plug._dirty()

# the DependNode class also gets this iterPlugs implementation
    def iterPlugs(self):
        return self.__plugs.itervalues()

Whew, now after all that we should be able to run this test to add 2 and 4 together using 3 nodes:

const = DependNode()
const.addInput('value')
const.value = 2.0

const2 = DependNode()
const2.addInput('value')
const2.value = 4.0


def addFunc(node):
    node.output = sum(node.inputs)


add = DependNode(addFunc)
add.addInput('inputs')
add.addOutput('output')
add.plug('inputs').addSource(const.plug('value'))
add.plug('inputs').addSource(const2.plug('value'))

print add.output

Here is a slightly more interesting example of a pointOnCurve node that computes a point on a bezier segment and feeds it to a different node:

def pointOnCurve(node):
    # interpolate a 2D bezier segment with t=node.parameter
    cvs = node.cvs2[0]
    t = node.parameter[0]
    r = [0, 0]
    for i in xrange(2):
        a, b, c = cvs[1][i] - cvs[0][i], cvs[2][i] - cvs[1][i], cvs[3][i] - cvs[2][i]
        d = b - a
        a, b, c, d = c - b - d, d + d + d, a + a + a, cvs[0][i]
        r[i] = (t * (t * (t * a + b) + c) + d)
    node.point = tuple(r)

timeNode = DependNode()
timeNode.addInput('time')
timeNode.time = 0.0

curveNode = DependNode()
curveNode.addInput('cvs')
curveNode.cvs = ((0.0, 0.0), (0.2, 0.0), (0.0, 0.2), (0.2, 0.2))

pointOnCurveNode = DependNode(pointOnCurve)
pointOnCurveNode.addInput('cvs2')
pointOnCurveNode.addInput('parameter')
pointOnCurveNode.plug('parameter').addSource(timeNode.plug('time'))
pointOnCurveNode.plug('cvs2').addSource(curveNode.plug('cvs'))
pointOnCurveNode.addOutput('point')

transform = DependNode()
transform.addInput('translate')
transform.plug('translate').addSource(pointOnCurveNode.plug('point'))

print transform.translate
timeNode.time = 0.5
print transform.translate

Now there are 2 more improvements I want to add. First whenever a plug has sources, there is no real need to cache the result. We can just directly read from the sources and save ourselves a bunch of copying. Second I want to be able to alter an input plug temporarily. Imagine an interface with a point moving over time, it may be nice to alter that point by hand to e.g. feed that back in some animation system. In this case it is important for us to set the translate of a transform without it jumping back to the source time as soon as we redraw (which would read the translate attribute).

First I edit Plug.value() to use the cache only for computed data.
Now that __cache is freed up I plan to use it for the user-override. So if a cache is available I want to use it at all times.
Next I return sources if available, else cache again which should in this case be computed.

Next in Plug._dirty() I only dirty computed plugs, and I set the cache to None if the plug as sources coming in. This will result in a small problem with Plug.setValue: it currently caches the value and then dirties the plug. That means that on input plugs the cache is set to None immediately. I just swap those 2 lines so my setValue sticks, and the user override also works.

    def value(self):
        if self.__isDirty and self.__isComputed:
            # compute dirty output
            if self.__isComputed:
                self.__node.compute()
            # plug is clean now
            self.clean()

        # override cache for input attributes to intervene with connections?
        if self.__cache is not None:
            return self.__cache

        # fetch input connection
        if self.__sources:
            return [source.value() for source in self.__sources]

        return self.__cache

    def _dirty(self):
        if self.__isComputed:
            # don't dirty again
            if self.__isDirty:
                return
            self.__isDirty = True
        if self.__sources:
            self.__cache = None

        # dirty plugs that are computed based on this plug
        if not self.__isComputed:
            for plug in self.__node.iterPlugs():
                if plug._isComputed():
                    plug._dirty()

        # dirty plugs that use this plug as source
        for plug in self.__targets:
            plug._dirty()

    def setValue(self, value):
        self._dirty()
        self.__cache = value

Using the previous example I can now see how overriding the parameter works until time is changed again making it use the source connection again:

print transform.translate
timeNode.time = 0.5
print transform.translate
pointOnCurveNode.parameter = 0.25
print transform.translate
timeNode.time = 0.5
print transform.translate

Full code:

from collections import OrderedDict


class Plug(object):
    def __init__(self, name, node, isComputed):
        self.__name = name  # really used for debugging only
        self.__node = node  # the parent node this plug belongs to
        self.__isComputed = isComputed  # make this an output-plug
        self.__isDirty = isComputed  # upstream dependencies changed?
        self.__cache = None  # last computed value for outputs, user set value for inputs
        self.__sources = []  # plugs we depend on
        self.__targets = []  # plugs we feed into

    def clean(self):
        self.__isDirty = False

    def value(self):
        if self.__isDirty and self.__isComputed:
            # compute dirty output
            if self.__isComputed:
                self.__node.compute()
            # plug is clean now
            self.clean()

        # override cache for input attributes to intervene with connections?
        if self.__cache is not None:
            return self.__cache

        # fetch input connection
        if self.__sources:
            return [source.value() for source in self.__sources]

        return self.__cache

    def setValue(self, value):
        self._dirty()
        self.__cache = value

    def addSource(self, sourcePlug):
        if sourcePlug not in self.__sources:
            self.__sources.append(sourcePlug)
            sourcePlug.__targets.append(self)
            self._dirty()

    def removeSource(self, sourcePlug):
        if sourcePlug in self.__sources:
            self.__sources.remove(sourcePlug)
            sourcePlug.__targets.remove(self)
            self._dirty()

    def _isComputed(self):
        return self.__isComputed

    def _dirty(self):
        if self.__isComputed:
            # don't dirty again
            if self.__isDirty:
                return
            self.__isDirty = True
        if self.__sources:
            self.__cache = None

        # dirty plugs that are computed based on this plug
        if not self.__isComputed:
            for plug in self.__node.iterPlugs():
                if plug._isComputed():
                    plug._dirty()

        # dirty plugs that use this plug as source
        for plug in self.__targets:
            plug._dirty()


class DependNode(object):
    def __init__(self, computeFunc=None):
        self.__plugs = OrderedDict()  # list of node plugs, dict key should match plug name
        self.__computeFunc = computeFunc  # logic implementation, should be a callable that accepts this DependNode instance as only argument

    def iterPlugs(self):
        return self.__plugs.itervalues()

    def compute(self):
        # call compute function if node has logic
        if self.__computeFunc is not None:
            self.__computeFunc(self)

        # clean all plugs
        for plug in self.__plugs.itervalues():
            plug.clean()

    def addInput(self, name):
        self.__plugs[name] = Plug(name, self, isComputed=False)

    def addOutput(self, name):
        self.__plugs[name] = Plug(name, self, isComputed=True)

    def plug(self, name):
        return self.__plugs[name]

    def __getattr__(self, name):
        return self.__plugs[name].value()

    def __setattr__(self, name, value):
        if name not in ('_DependNode__plugs', '_DependNode__computeFunc'):
            return self.__plugs[name].setValue(value)
        return super(DependNode, self).__setattr__(name, value)


def pointOnCurve(node):
    print 'compute'
    # interpolate a 2D bezier segment with t=node.parameter
    cvs = node.cvs2[0]
    t = node.parameter[0]
    r = [0, 0]
    for i in xrange(2):
        a, b, c = cvs[1][i] - cvs[0][i], cvs[2][i] - cvs[1][i], cvs[3][i] - cvs[2][i]
        d = b - a
        a, b, c, d = c - b - d, d + d + d, a + a + a, cvs[0][i]
        r[i] = (t * (t * (t * a + b) + c) + d)
    node.point = tuple(r)

Now I’ve also been working on sort of an entity-component based system on top of this graph, based on the PyOpenGL mesh renderer from a few posts back.

It allows me to create a scene graph based on Transforms and exported logic (using DependNodes like above) but also logical relationships which are not necessarily customizable, e.g. when rendering a camera I can query it’s entity, whose transform will provide the view matrix, without connecting these things. This creates a more classical (and less customizable) framework for render logic, but gives full control over the scene logic behind it through the dependency graph.

Using the point on curve example above and feeding it to a cube’s transform – while also implementing a timeline UI & 3D gizmo drawing – this fun thing came out:

Now the next steps would probably involve adding shader support, file watchers and creating a more elaborate Maya exporter that supports various Maya nodes (Maya is also dependency graph based so it fits very well with this!) but I’m not sure if I’m going to keep working on this.

Improving a renderer

This feeds into my previous write up on the tools developed for our 64kb endeavours.

After creating Eidolon [Video] we were left with the feeling that the rendering can be a lot better. We had this single pass bloom and simple lambert & phong shading, no anti aliasing and very poor performing depth of field. Last the performance hit for reflections was through the roof as well.

I started almost immediately with a bunch of improvements, most of this work was done within a month after Revision. Which shows in our newest demo Yermom [Video]. I’ll go over the improvements in chronological order and credit any sources used (of which there were a lot), if I managed to document that right…

Something useful to mention, all my buffers are Float32 RGBA.

Low-resolution reflections:

Basically the scene is raymarched, for every pixel there is a TraceAndShade call to render the pixel excluding fog and reflection.
From the result we do another TraceAndShade for the reflection. This makes the entire thing twice as slow when reflections are on.
Instead I early out at this point if:
if(reflectivity == 0 || gl_FragCoord.x % 4 != 0 || gl_FragCoord.y % 4 != 0) return;
That results in only 1 in 16 pixels being reflective. So instead of compositing the reflection directly I write it to a separate buffer.
Then in a future pass I composite the 2 buffers, where I just do a look up in the reflection buffer like so:
texelFetch(uImages[0], ivec2(gl_FragCoord.xy)) + texelFetch(uImages[1], ivec2(gl_FragCoord.xy / 4) * 4)
In my real scenario I removed that * 4 and render to a 4 times smaller buffer instead, so reading it back results in free interpolation.
I still have glitches when blurring the reflections too much & around edges in general. Definitely still room for future improvement.

Oren Nayar diffuse light response

The original paper and this image especially convinced me into liking this shading model for diffuse objects.

So I tried to implement that, failed a few times, got pretty close, found an accurate implementation, realized it was slow, and ended on these 2 websites:
http://www.popekim.com/2011/11/optimized-oren-nayar-approximation.html
http://www.artisticexperiments.com/cg-shaders/cg-shaders-oren-nayar-fast

That lists a nifty trick to fake it, I took away some terms as I realized they contributed barely any visible difference, so I got something even less accurate. I already want to revisit this, but it’s one of the improvements I wanted to share nonetheless.

float orenNayarDiffuse(float satNdotV, float satNdotL, float roughness)
{
    float lambert = satNdotL;
    if(roughness == 0.0)
        return lambert;
    float softRim = saturate(1.0 - satNdotV * 0.5);

    // my magic numbers
    float fakey = pow(lambert * softRim, 0.85);
    return mix(lambert, fakey * 0.85, roughness);
}

GGX specular

There are various open source implementations of this. I found one here:
http://filmicworlds.com/blog/optimizing-ggx-shaders-with-dotlh/
It talks about tricks to optimize things by precomputing a lookup texture, I didn’t go that far. There’s not much I can say about this, as I don’t fully understand the math and how it changes from the basic phong dot(N, H).

float G1V(float dotNV, float k){return 1.0 / (dotNV * (1.0 - k)+k);}

float ggxSpecular(float NdotV, float NdotL, vec3 N, vec3 L, vec3 V, float roughness)
{
    float F0 = 0.5;

    vec3 H = normalize(V + L);
    float NdotH = saturate(dot(N, H));
    float LdotH = saturate(dot(L, H));
    float a2 = roughness * roughness;

    float D = a2 / (PI * sqr(sqr(NdotH) * (a2 - 1.0) + 1.0));
    float F = F0 + (1.0 - F0) * pow(1.0 - LdotH, 5.0);
    float vis = G1V(NdotL, a2 * 0.5) * G1V(NdotV, a2 * 0.5);
    return NdotL * D * F * vis;
}

FXAA

FXAA3 to be precise. There whitepaper is quite clear, still why bother writing it if it’s open source. I can’t remember which one I used, but here’s a few links:
https://gist.github.com/kosua20/0c506b81b3812ac900048059d2383126
https://github.com/urho3d/Urho3D/blob/master/bin/CoreData/Shaders/GLSL/FXAA3.glsl
https://github.com/vispy/experimental/blob/master/fsaa/fxaa.glsl
Preprocessed and minified for preset 12 made it very small in a compressed executable. Figured I’d just share it.

#version 420
uniform vec3 uTimeResolution;uniform sampler2D uImages[1];out vec4 z;float aa(vec3 a){vec3 b=vec3(.299,.587,.114);return dot(a,b);}
#define bb(a)texture(uImages[0],a)
#define cc(a)aa(texture(uImages[0],a).rgb)
#define dd(a,b)aa(texture(uImages[0],a+(b*c)).rgb)
void main(){vec2 a=gl_FragCoord.xy/uTimeResolution.yz,c=1/uTimeResolution.yz;vec4 b=bb(a);b.y=aa(b.rgb);float d=dd(a,vec2(0,1)),e=dd(a,vec2(1,0)),f=dd(a,vec2(0,-1)),g=dd(a,vec2(-1,0)),h=max(max(f,g),max(e,max(d,b.y))),i=h-min(min(f,g),min(e,min(d,b.y)));if(i<max(.0833,h*.166)){z=bb(a);return;}h=dd(a,vec2(-1,-1));float j=dd(a,vec2( 1,1)),k=dd(a,vec2( 1,-1)),l=dd(a,vec2(-1,1)),m=f+d,n=g+e,o=k+j,p=h+l,q=c.x;
bool r=abs((-2*g)+p)+(abs((-2*b.y)+m)*2)+abs((-2*e)+o)>=abs((-2*d)+l+j)+(abs((-2*b.y)+n)*2)+abs((-2*f)+h+k);if(!r){f=g;d=e;}else q=c.y;h=f-b.y,e=d-b.y,f=f+b.y,d=d+b.y,g=max(abs(h),abs(e));i=clamp((abs((((m+n)*2+p+o)*(1./12))-b.y)/i),0,1);if(abs(e)<abs(h))q=-q;else f=d;vec2 s=a,t=vec2(!r?0:c.x,r?0:c.y);if(!r)s.x+=q*.5;else s.y+=q*.5;
vec2 u=vec2(s.x-t.x,s.y-t.y);s=vec2(s.x+t.x,s.y+t.y);j=((-2)*i)+3;d=cc(u);e=i*i;h=cc(s);g*=.25;i=b.y-f*.5;j=j*e;d-=f*.5;h-=f*.5;bool v,w,x,y=i<0;
#define ee(Q) v=abs(d)>=g;w=abs(h)>=g;if(!v)u.x-=t.x*Q;if(!v)u.y-=t.y*Q;x=(!v)||(!w);if(!w)s.x+=t.x*Q;if(!w)s.y+=t.y*Q;
#define ff if(!v)d=cc(u.xy);if(!w)h=cc(s.xy);if(!v)d=d-f*.5;if(!w)h=h-f*.5;
ee(1.5)if(x){ff ee(2.)if(x){ff ee(4.)if(x){ff ee(12.)}}}e=a.x-u.x;f=s.x-a.x;if(!r){e=a.y-u.y;f=s.y-a.y;}q*=max((e<f?(d<0)!=y:(h<0)!=y)?(min(e,f)*(-1/(f+e)))+.5:0,j*j*.75);if(!r)a.x+=q;else a.y+=q;z=bb(a);}

Multi pass bloom

The idea for this one was heavily inspired by this asset for Unity:

https://www.assetstore.unity3d.com/en/#!/content/17324

I’m quite sure the technique is not original, but that’s where I got the idea.

The idea is to downsample and blur at many resolutions and them combine the (weighted) results to get a very high quality full screen blur.
So basically downsample to a quarter (factor 2) of the screen using this shader:

#version 420

uniform vec3 uTimeResolution;
#define uTime (uTimeResolution.x)
#define uResolution (uTimeResolution.yz)

uniform sampler2D uImages[1];

out vec4 outColor0;

void main()
{
    outColor0 = 0.25 * (texture(uImages[0], (gl_FragCoord.xy + vec2(-0.5)) / uResolution)
    + texture(uImages[0], (gl_FragCoord.xy + vec2(0.5, -0.5)) / uResolution)
    + texture(uImages[0], (gl_FragCoord.xy + vec2(0.5, 0.5)) / uResolution)
    + texture(uImages[0], (gl_FragCoord.xy + vec2(-0.5, 0.5)) / uResolution));
}

Then downsample that, and recurse until we have a factor 64

All the downsamples fit in the backbuffer, so in theory that together with the first blur pass can be done in 1 go using the backbuffer as sampler2D as well. But to avoid the hassle of figuring out the correct (clamped!) uv coordinates I just use a ton of passes.

Then take all these downsampled buffers and ping pong them for blur passes, so for each buffer:
HBLUR taking steps of 2 pixels, into a buffer of the same size
VBLUR, back into the initial downsampled buffer
HBLUR taking steps of 3 pixels, reuse the HBLUR buffer
VBLUR, reuse the initial downsampled buffer

The pixel steps is given to uBlurSize, the direction of blur is given to uDirection.

#version 420

out vec4 color;

uniform vec3 uTimeResolution;
#define uTime (uTimeResolution.x)
#define uResolution (uTimeResolution.yz)

uniform sampler2D uImages[1];
uniform vec2 uDirection;
uniform float uBlurSize;

const float curve[7] = { 0.0205,
    0.0855,
    0.232,
    0.324,
    0.232,
    0.0855,
    0.0205 };

void main()
{
    vec2 uv = gl_FragCoord.xy / uResolution;
    vec2 netFilterWidth = uDirection / uResolution * uBlurSize;
    vec2 coords = uv - netFilterWidth * 3.0;

    color = vec4(0);
    for( int l = 0; l < 7; l++ )
    {
        vec4 tap = texture(uImages[0], coords);
        color += tap * curve[l];
        coords += netFilterWidth;
    }
}

Last we combine passes with lens dirt. uImages[0] is the original backbuffer, 1-6 is all the downsampled and blurred buffers, 7 is a lens dirt image.
My lens dirt texture is pretty poor, its just a precalced texture with randomly scaled and colored circles and hexagons, sometimes filled and sometimes outlines.
I don’t think I actually ever used the lens dirt or bloom intensity as uniforms.

#version 420

out vec4 color;

uniform vec3 uTimeResolution;
#define uTime (uTimeResolution.x)
#define uResolution (uTimeResolution.yz)

uniform sampler2D uImages[8];
uniform float uBloom = 0.04;
uniform float uLensDirtIntensity = 0.3;

void main()
{
    vec2 coord = gl_FragCoord.xy / uResolution;
    color = texture(uImages[0], coord);

    vec3 b0 = texture(uImages[1], coord).xyz;
    vec3 b1 = texture(uImages[2], coord).xyz * 0.6; // dampen to have less banding in gamma space
    vec3 b2 = texture(uImages[3], coord).xyz * 0.3; // dampen to have less banding in gamma space
    vec3 b3 = texture(uImages[4], coord).xyz;
    vec3 b4 = texture(uImages[5], coord).xyz;
    vec3 b5 = texture(uImages[6], coord).xyz;

    vec3 bloom = b0 * 0.5
        + b1 * 0.6
        + b2 * 0.6
        + b3 * 0.45
        + b4 * 0.35
        + b5 * 0.23;

    bloom /= 2.2;
    color.xyz = mix(color.xyz, bloom.xyz, uBloom);

    vec3 lens = texture(uImages[7], coord).xyz;
    vec3 lensBloom = b0 + b1 * 0.8 + b2 * 0.6 + b3 * 0.45 + b4 * 0.35 + b5 * 0.23;
    lensBloom /= 3.2;
    color.xyz = mix(color.xyz, lensBloom, (clamp(lens * uLensDirtIntensity, 0.0, 1.0)));
    
    color.xyz = pow(color.xyz, vec3(1.0 / 2.2));
}

White lines on a cube, brightness of 10.

White lines on a cube, brightness of 300.

Sphere tracing algorithm

Instead of a rather naive sphere tracing loop I used in a lot of 4kb productions and can just write by heart I went for this paper:
http://erleuchtet.org/~cupe/permanent/enhanced_sphere_tracing.pdf
It is a clever technique that involves overstepping and backgracking only when necessary, as well as keeping track of pixel size in 3D to realize when there is no need to compute more detail. The paper is full of code snippets and clear infographics, I don’t think I’d be capable to explain it any clearer.

Beauty shots

Depth of field

I initially only knew how to do good circular DoF, until this one came along: https://www.shadertoy.com/view/4tK3WK
Which I used initially, but to get it to look good was really expensive, because it is all single pass. Then I looked into a 3-blur-pass solution, which sorta worked, but when I went looking for more optimized versions I found this 2 pass one: https://www.shadertoy.com/view/Xd3GDl. It works extremely well, the only edge cases I found were when unfocusing a regular grid of bright points.

Here’s what I wrote to get it to work with a depth buffer (depth based blur):

const int NUM_SAMPLES = 16;

void main()
{
    vec2 fragCoord = gl_FragCoord.xy;

    const vec2 blurdir = vec2( 0.0, 1.0 );
    vec2 blurvec = (blurdir) / uResolution;
    vec2 uv = fragCoord / uResolution.xy;

    float z = texture(uImages[0], uv).w;
    fragColor = vec4(depthDirectionalBlur(z, CoC(z), uv, blurvec, NUM_SAMPLES), z);
}

Second pass:

const int NUM_SAMPLES = 16;

void main()
{
    vec2 uv = gl_FragCoord.xy / uResolution;

    float z = texture(uImages[0], uv).w;

    vec2 blurdir = vec2(1.0, 0.577350269189626);
    vec2 blurvec = normalize(blurdir) / uResolution;
    vec3 color0 = depthDirectionalBlur(z, CoC(z), uv, blurvec, NUM_SAMPLES);

    blurdir = vec2(-1.0, 0.577350269189626);
    blurvec = normalize(blurdir) / uResolution;
    vec3 color1 = depthDirectionalBlur(z, CoC(z), uv, blurvec, NUM_SAMPLES);

    vec3 color = min(color0, color1);
    fragColor = vec4(color, 1.0);
}

Shared header:

#version 420

// default uniforms
uniform vec3 uTimeResolution;
#define uTime (uTimeResolution.x)
#define uResolution (uTimeResolution.yz)

uniform sampler2D uImages[1];

uniform float uSharpDist = 15; // distance from camera that is 100% sharp
uniform float uSharpRange = 0; // distance from the sharp center that remains sharp
uniform float uBlurFalloff = 1000; // distance from the edge of the sharp range it takes to become 100% blurry
uniform float uMaxBlur = 16; // radius of the blur in pixels at 100% blur

float CoC(float z)
{
    return uMaxBlur * min(1, max(0, abs(z - uSharpDist) - uSharpRange) / uBlurFalloff);
}

out vec4 fragColor;

//note: uniform pdf rand [0;1)
float hash1(vec2 p)
{
    p = fract(p * vec2(5.3987, 5.4421));
    p += dot(p.yx, p.xy + vec2(21.5351, 14.3137));
    return fract(p.x * p.y * 95.4307);
}

#define USE_RANDOM

vec3 depthDirectionalBlur(float z, float coc, vec2 uv, vec2 blurvec, int numSamples)
{
    // z: z at UV
    // coc: blur radius at UV
    // uv: initial coordinate
    // blurvec: smudge direction
    // numSamples: blur taps
    vec3 sumcol = vec3(0.0);

    for (int i = 0; i < numSamples; ++i)
    {
        float r =
            #ifdef USE_RANDOM
            (i + hash1(uv + float(i + uTime)) - 0.5)
            #else
            i
            #endif
            / float(numSamples - 1) - 0.5;
        vec2 p = uv + r * coc * blurvec;
        vec4 smpl = texture(uImages[0], p);
        if(smpl.w < z) // if sample is closer consider it's CoC
        {
            p = uv + r * min(coc, CoC(smpl.w)) * blurvec;
            p = uv + r * CoC(smpl.w) * blurvec;
            smpl = texture(uImages[0], p);
        }
        sumcol += smpl.xyz;
    }

    sumcol /= float(numSamples);
    sumcol = max(sumcol, 0.0);

    return sumcol;
}

Additional sources used for a longer time

Distance function library

http://mercury.sexy/hg_sdf/
A very cool site explaining all kinds of things you can do with this code. I think many of these functions were invented already, but with some bonusses as ewll as a very clear code style and excellent documentations for full accessibility.
For an introduction to this library:
https://www.youtube.com/watch?v=T-9R0zAwL7s

Noise functions

https://www.shadertoy.com/view/4djSRW
Hashes optimized to only implement hash4() and the rest is just swizzling and redirecting, so a float based hash is just:

float hash1(float x){return hash4(vec4(x)).x;}
vec2 hash2(float x){return hash4(vec4(x)).xy;}

And so on.

Value noise
https://www.shadertoy.com/view/4sfGzS
https://www.shadertoy.com/view/lsf3WH

Voronoi 2D
https://www.shadertoy.com/view/llG3zy
Voronoi is great, as using the center distance we get worley noise instead, and we can track cell indices for randomization.
This is fairly fast, but still too slow to do realtime. So I implemented tileable 2D & 3D versions.

Perlin
Layering the value noise for N iterations, scaling the UV by 2 and weight by 0.5 in every iteration.
These could be controllable parameters for various different looks. A slower weight decrease results in a more wood-grain look for example.

float perlin(vec2 p, int iterations)
{
    float f = 0.0;
    float amplitude = 1.0;

    for (int i = 0; i < iterations; ++i)
    {
        f += snoise(p) * amplitude;
        amplitude *= 0.5;
        p *= 2.0;
    }

    return f * 0.5;
}

Now the perlin logic can be applied to worley noise (voronoi center) to get billows. I did the same for the voronoi edges, all tileable in 2D and 3D for texture precalc. Here’s an example. Basically the modulo in the snoise function is the only thing necessary to make things tileable. Perlin then just uses that and keeps track of the scale for that layer.

float snoise_tiled(vec2 p, float scale)
{
    p *= scale;
    vec2 c = floor(p);
    vec2 f = p - c;
    f = f * f * (3.0 - 2.0 * f);
    return mix(mix(hash1(mod(c + vec2(0.0, 0.0), scale) + 10.0),
    hash1(mod(c + vec2(1.0, 0.0), scale) + 10.0), f.x),
    mix(hash1(mod(c + vec2(0.0, 1.0), scale) + 10.0),
    hash1(mod(c + vec2(1.0, 1.0), scale) + 10.0), f.x), f.y);
}
float perlin_tiled(vec2 p, float scale, int iterations)
{
    float f = 0.0;
    p = mod(p, scale);
    float amplitude = 1.0;
    
    for (int i = 0; i < iterations; ++i)
    {
        f += snoise_tiled(p, scale) * amplitude;
        amplitude *= 0.5;
        scale *= 2.0;
    }

    return f * 0.5;
}

Creating a tool to make a 64k demo

In the process of picking up this webpage again, I can talk about something we did quite a while ago. I, together with a team, went through the process of making a 64 kilobyte demo. We happened to win at one of the biggest demoscene events in europe. Revision 2017. I still feel the afterglow of happiness from that.

If you’re not sure what that is, read on, else, scroll down! You program a piece of software that is only 64 kb in size, that shows an audio-visual experience generated in realtime. To stay within such size limits you have to generate everything, we chose to go for a rendering technique called ray marching, that allowed us to put all 3D modeling, texture generation, lighting, etc. as ascii (glsl sources) in the executable. On top of that we used a very minimal (yet versatile) modular synthesizer called 64klang2. Internally it stores a kind of minimal midi data and the patches and it can render amazing audio in realtime, so it doesn’t need to pre-render the song or anything. All this elementary and small size data and code compiles to something over 200kb, which is then compressed using an executable packer like UPX or kkrunchy

It was called Eidolon. You can watch a video:
https://youtu.be/rsZHBJdaz-Y
Or stress test your GPU / leave a comment here:
http://www.pouet.net/prod.php?which=69669

The technologies used were fairly basic, it’s very old school phong & lambert shading, 2 blur passes for bloom, so all in all pretty low tech and not worth discussing. What I would like to discuss is the evolution of the tool. I’ll keep it high level this time though. Maybe in the future I can talk about specific implementations of things, but just seeing the UI will probably explain a lot of the features and the way things work.

Step 1: Don’t make a tool from scratch

Our initial idea was to leverage existing software. One of our team members, who controlled the team besides modelling and eventually directing the whole creative result, had some experience with a real-time node based software called Touch Designer. It is a tool where you can do realtime visuals, and it supports exactly what we need: rendering into a 2D texture with a fragment shader.

We wanted to have the same rendering code for all scenes, and just fill in the modeling and material code that is unique per scene. We figured out how to concatenate separate pieces of text and draw them into a buffer. Multiple buffers even. At some point i packed all code and rendering logic of a pass into 1 grouped node and we could design our render pipeline entirely node based.

Here you see the text snippets (1) merged into some buffers (2) and then post processed for the bloom (3). On the right (4) you see the first problem we hit with Touch Designer. The compiler error log is drawn inside this node. There is basically no easy way to have that error visible in the main application somewhere. So the first iteration of the renderer (and coincidentally the main character of Eidolon) looked something like this:

The renderer didn’t really change after this.

In case I sound too negative about touch designer in the next few paragraphs, our use case was rather special, so take this with a grain of salt!

We have a timeline control, borrowed the UI design from Maya a little, so this became the main preview window. That’s when we hit some problems though. The software has no concept of window focus, so it’d constantly suffer hanging keys or responding to keys while typing in the text editor.

Last issue that really killed it though: everything has to be in 1 binary file. There is no native way to reference external text files for the shader code, or merge node graphs. There is a really weird utility that expands the binary to ascii, but then literally every single node is a text file so it is just unmergeable.

Step 2: Make a tool

So then this happened:

Over a week’s time in the evenings and then 1 long saturday I whipped this up using PyQt and PyOpenGL. This is the first screenshot I made, the curve editor isn’t actually an editor yet and there is no concept of camera shots (we use this to get hard cuts).

It has all the same concepts however, separate text files for the shader code, with an XML file determining what render passes use what files and in what buffer they render / what buffers they reference in turn. With the added advantage of the perfect granularity all stored in ascii files.

Some files are template-level, some were scene-level, so creating a new scene actually only copies the scene-level fies which can them be adjusted in a text editor, with a file watcher updating the picture. The CurveEditor feeds right back into the uniforms of the shader (by name) and the time slider at the bottom is the same idea as Maya / what you saw before.

Step 3: Make it better

Render pipeline
The concept was to set up a master render pipeline into which scenes would inject snippets of code. On disk this became a bunch of snippets, and an XML based template definition. This would be the most basic XML file:

<template>
    <pass buffer="0" outputs="1">
        <global path="header.glsl"/>
        <section path="scene.glsl"/>
        <global path="pass.glsl"/>
    </pass>
    <pass input0="0">
        <global path="present.glsl"/>
    </pass>
</template>

This will concatenated 3 files to 1 fragment shader, render into full-screen buffer “0” and then use present.glsl as another fragment shader, which in turn has the previous buffer “0” as input (forwarded to a sampler2D uniform).

This branched out into making static bufffers (textures), setting buffer sizes (smaller textures), multiple target buffers (render main and reflection pass at once), set buffer size to a portion of the screen (downsampling for bloom), 3D texture support (volumetric noise textures for cloud).

Creating a new scene will just copy “scene.glsl” from the template to a new folder, there you can then fill out the necessary function(s) to get a unique scene. Here’s an example from our latest Evoke demo. 6 scenes, under which you see the “section” files for each scene.

Camera control
The second important thing I wanted to tackle was camera control. Basically the demo will control the camera based on some animation data, but it is nice to fly around freely and even use the current camera position as animation keyframe. So this was just using Qt’s event system to hook up the mouse and keyboard to the viewport.

I also created a little widget that displays where the camera is, has an “animation input or user input” toggle as well as a “snap to current animation frame” button.

Animation control
So now to animate the camera, without hard coding values! Or even typing numbers, preferably. I know a lot of people use a tracker-like tool called Rocket, I never used it and it looks an odd way to control animation data to me. I come from a 3D background, so I figured I’d just want a curve editor like e.g. Maya has. In Touch Designer we also had a basic curve editor, conveniently you can name a channel the same as a uniform, then just have code evaluate the curve at the current time and send the result to that uniform location.
Some trickery was necessary to pack vec3s, I just look for channels that start with the same name and then end in .x, .y, .z, and possibly .w.

Here’s an excerpt from a long camera shot with lots of movement, showing off our cool hermite splines. At the top right you can see we have several built in tangent modes, we never got around to building custom tangent editing. In the end this is more than enough however. With flat tangents we can create easing/acceleration, with spline tangents we can get continuous paths and with linear tangents we get continuous speed. Next to that are 2 cool buttons that allow us to feed the camera position to another uniform, so you can literally fly to a place where you want to put an object. It’s not as good as actual move/rotate widgets but for the limited times we need to place 3D objects it’s great.

Hard cuts
Apart from being impossible to represent in this interface, we don’t support 2 keys at identical times. This means that we can’t really have the camera “jump” to a new position instantly. With a tiny amount of curve inbetween the previous and the next shot position, the time cursor can actually render 1 frame of a random camera position. So we had to solve this. I think it is one of the only big features that you won’t see in the initial screenshot above actually.

Introducing camera shots. A shot has its own “scene it should display” and its own set of animation data. So selecting a different shot yields different curve editor content. Shots are placed on a shared timeline, so scrolling through time will automatically show the right shot and setting a keyframe will automatically figure out the “shot local time” to put the key based on the global demo time. The curve editor has it’s own playhead that is directly linked to the global timeline as well so we can adjust the time in multiple places.

When working with lots of people we had issues with people touching other people’s (work in progress) shots. Therefore we introduced “disabling” of shots. This way anyone could just prefix their shots and disable them before submitting, and we could mix and match shots from several people to get a final camera flow we all liked.

Shots are also rendered on the timeline as colored blocks. The grey block underneath those is our “range slider”. It makes the top part apply on only a subsection of the demo, so it is easy to loop a specific time range, or just zoom in far enough to use the mouse to change the time granularly enough.

The devil is in the details
Some things I overlooked in the first implementation, and some useful things I added only recently.
1. Undo/Redo of animation changes. Not unimportant, and luckily not hard to add with Qt.
2. Ctrl click timeline to immediately start animating that shot
3. Right click a shot to find the scene
4. Right click a scene to create a shot for that scene in particular
5. Current time display in minutes:seconds instead of just beats
6. BPM stored per-project instead of globally
7. Lots of hotkeys!

These things make the tool just that much faster to use.

Finally, here’s our tool today. There’s still plenty to be done, but we made 2 demos with it so far and it gets better every time!

Part 3: Importing and drawing a custom mesh file

Part 3: Creating an importer

This is part 3 of a series and it is about getting started with visualizing triangle meshes with Python 2.7 using the libraries PyOpenGL and PyQt4.

Part 1
Part 2
Part 3

I will assume you know python, you will not need a lot of Qt or OpenGL experience, though I will also not go into the deeper details of how OpenGL works. For that I refer you to official documentation and the excellent (C++) tutorials at https://open.gl/. Although they are C++, there is a lot of explanation about OpenGL and why to do certain calls in a certain order.

On a final note: I will make generalizations and simplifications when explaining things. If you think something works different then I say it probably does, this is to try and convey ideas to beginners, not to explain low level openGL implementations.

3.1 Importing

Now with our file format resembling openGL so closely makes this step relatively easy. First I’ll declare some globals, because openGL does not have real enums but just a bunch of global constants I make some groups to do testing and data mapping against.

from OpenGL.GL import *

attributeElementTypes = (GL_BYTE,
                        GL_UNSIGNED_BYTE,
                        GL_SHORT,
                        GL_UNSIGNED_SHORT,
                        GL_INT,
                        GL_UNSIGNED_INT,
                        GL_HALF_FLOAT,
                        GL_FLOAT,
                        GL_DOUBLE,
                        GL_FIXED,
                        GL_INT_2_10_10_10_REV,
                        GL_UNSIGNED_INT_2_10_10_10_REV,
                        GL_UNSIGNED_INT_10F_11F_11F_REV)
sizeOfType = {GL_BYTE: 1,
             GL_UNSIGNED_BYTE: 1,
             GL_SHORT: 2,
             GL_UNSIGNED_SHORT: 2,
             GL_INT: 4,
             GL_UNSIGNED_INT: 4,
             GL_HALF_FLOAT: 2,
             GL_FLOAT: 4,
             GL_DOUBLE: 8,
             GL_FIXED: 4,
             GL_INT_2_10_10_10_REV: 4,
             GL_UNSIGNED_INT_2_10_10_10_REV: 4,
             GL_UNSIGNED_INT_10F_11F_11F_REV: 4}
drawModes = (GL_POINTS,
            GL_LINE_STRIP,
            GL_LINE_LOOP,
            GL_LINES,
            GL_LINE_STRIP_ADJACENCY,
            GL_LINES_ADJACENCY,
            GL_TRIANGLE_STRIP,
            GL_TRIANGLE_FAN,
            GL_TRIANGLES,
            GL_TRIANGLE_STRIP_ADJACENCY,
            GL_TRIANGLES_ADJACENCY,
            GL_PATCHES)
indexTypeFromSize = {1: GL_UNSIGNED_BYTE, 2: GL_UNSIGNED_SHORT, 4: GL_UNSIGNED_INT}

Next up is a Mesh class that stores a vertex array object (and corresponding buffers for deletion) along with all info necessary to draw the mesh once it’s on the GPU.

class Mesh(object):
    def __init__(self, vao, bufs, drawMode, indexCount, indexType):
        self.__vao = vao
        self.__bufs = bufs
        self.__drawMode = drawMode
        self.__indexCount = indexCount
        self.__indexType = indexType
 
    def __del__(self):
        glDeleteBuffers(len(self.__bufs), self.__bufs)
        glDeleteVertexArrays(1, [self.__vao])
 
    def draw(self):
        glBindVertexArray(self.__vao)
        glDrawElements(self.__drawMode, self.__indexCount, self.__indexType, None)

Now let’s, given a file path, open up the file and run the importer for the right version (if known).

def model(filePath):
    vao = glGenVertexArrays(1)
    glBindVertexArray(vao)
    bufs = glGenBuffers(2)
    glBindBuffer(GL_ARRAY_BUFFER, bufs[0])
    glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, bufs[1])
    with open(filePath, 'rb') as fh:
        fileVersion = struct.unpack('B', fh.read(1))[0]
        if fileVersion == 0:
            return _loadMesh_v0(fh, vao, bufs)
        raise RuntimeError('Unknown mesh file version %s in %s' % (fileVersion, filePath))

Next we can start reading the rest of the file:

    vertexCount = struct.unpack('I', fh.read(4))[0]
    vertexSize = struct.unpack('B', fh.read(1))[0]
    indexCount = struct.unpack('I', fh.read(4))[0]
    indexSize = struct.unpack('B', fh.read(1))[0]
    assert indexSize in indexTypeFromSize, 'Unknown element data type, element size must be one of %s' % indexTypeFromSize.keys()
    indexType = indexTypeFromSize[indexSize]
    drawMode = struct.unpack('I', fh.read(4))[0]
    assert drawMode in (GL_LINES, GL_TRIANGLES), 'Unknown draw mode.'  # TODO: list all render types

Read and apply the attribute layout:

# gather layout
numAttributes = struct.unpack('B', fh.read(1))[0]
offset = 0
layouts = []
for i in xrange(numAttributes):
   location = struct.unpack('B', fh.read(1))[0]
   dimensions = struct.unpack('B', fh.read(1))[0]
   assert dimensions in (1, 2, 3, 4)
   dataType = struct.unpack('I', fh.read(4))[0]
   assert dataType in attributeElementTypes, 'Invalid GLenum value for attribute element type.'
   layouts.append((location, dimensions, dataType, offset))
   offset += dimensions * sizeOfType[dataType]
# apply
for layout in layouts:
   glVertexAttribPointer(layout[0], layout[1], layout[2], GL_FALSE, offset, ctypes.c_void_p(layout[3]))  # total offset is now stride
   glEnableVertexAttribArray(layout[0])

Read and upload the raw buffer data. This step is easy because we can directly copy the bytes as the storage matches exactly with how openGL expects it due to the layout code above.

raw = fh.read(vertexSize * vertexCount)
glBufferData(GL_ARRAY_BUFFER, vertexSize * vertexCount, raw, GL_STATIC_DRAW)
raw = fh.read(indexSize * indexCount)
glBufferData(GL_ELEMENT_ARRAY_BUFFER, indexSize * indexCount, raw, GL_STATIC_DRAW)

3.2 The final code

This is the application code including all the rendering from chapter 1, only the rectangle has been replaced by the loaded mesh.

# the importer
import struct
from OpenGL.GL import *

attributeElementTypes = (GL_BYTE,
                        GL_UNSIGNED_BYTE,
                        GL_SHORT,
                        GL_UNSIGNED_SHORT,
                        GL_INT,
                        GL_UNSIGNED_INT,
                        GL_HALF_FLOAT,
                        GL_FLOAT,
                        GL_DOUBLE,
                        GL_FIXED,
                        GL_INT_2_10_10_10_REV,
                        GL_UNSIGNED_INT_2_10_10_10_REV,
                        GL_UNSIGNED_INT_10F_11F_11F_REV)
sizeOfType = {GL_BYTE: 1,
             GL_UNSIGNED_BYTE: 1,
             GL_SHORT: 2,
             GL_UNSIGNED_SHORT: 2,
             GL_INT: 4,
             GL_UNSIGNED_INT: 4,
             GL_HALF_FLOAT: 2,
             GL_FLOAT: 4,
             GL_DOUBLE: 8,
             GL_FIXED: 4,
             GL_INT_2_10_10_10_REV: 4,
             GL_UNSIGNED_INT_2_10_10_10_REV: 4,
             GL_UNSIGNED_INT_10F_11F_11F_REV: 4}
drawModes = (GL_POINTS,
            GL_LINE_STRIP,
            GL_LINE_LOOP,
            GL_LINES,
            GL_LINE_STRIP_ADJACENCY,
            GL_LINES_ADJACENCY,
            GL_TRIANGLE_STRIP,
            GL_TRIANGLE_FAN,
            GL_TRIANGLES,
            GL_TRIANGLE_STRIP_ADJACENCY,
            GL_TRIANGLES_ADJACENCY,
            GL_PATCHES)
indexTypeFromSize = {1: GL_UNSIGNED_BYTE, 2: GL_UNSIGNED_SHORT, 4: GL_UNSIGNED_INT}


def _loadMesh_v0(fh, vao, bufs):
    vertexCount = struct.unpack('I', fh.read(4))[0]
    vertexSize = struct.unpack('B', fh.read(1))[0]
    indexCount = struct.unpack('I', fh.read(4))[0]
    indexSize = struct.unpack('B', fh.read(1))[0]
    assert indexSize in indexTypeFromSize, 'Unknown element data type, element size must be one of %s' % indexTypeFromSize.keys()
    indexType = indexTypeFromSize[indexSize]
    drawMode = struct.unpack('I', fh.read(4))[0]
    assert drawMode in (GL_LINES, GL_TRIANGLES), 'Unknown draw mode.'  # TODO: list all render types
  
    # gather layout
    numAttributes = struct.unpack('B', fh.read(1))[0]
    offset = 0
    layouts = []
    for i in xrange(numAttributes):
        location = struct.unpack('B', fh.read(1))[0]
        dimensions = struct.unpack('B', fh.read(1))[0]
        assert dimensions in (1, 2, 3, 4)
        dataType = struct.unpack('I', fh.read(4))[0]
        assert dataType in attributeElementTypes, 'Invalid GLenum value for attribute element type.'
        layouts.append((location, dimensions, dataType, offset))
        offset += dimensions * sizeOfType[dataType]
  
    # apply layout
    for layout in layouts:
        glVertexAttribPointer(layout[0], layout[1], layout[2], GL_FALSE, offset, ctypes.c_void_p(layout[3]))  # total offset is now stride
        glEnableVertexAttribArray(layout[0])
  
    raw = fh.read(vertexSize * vertexCount)
    glBufferData(GL_ARRAY_BUFFER, vertexSize * vertexCount, raw, GL_STATIC_DRAW)
    raw = fh.read(indexSize * indexCount)
    glBufferData(GL_ELEMENT_ARRAY_BUFFER, indexSize * indexCount, raw, GL_STATIC_DRAW)
  
    assert len(fh.read()) == 0, 'Expected end of file, but file is longer than it indicates'
    return Mesh(vao, bufs, drawMode, indexCount, indexType)


class Mesh(object):
    def __init__(self, vao, bufs, drawMode, indexCount, indexType):
        self.__vao = vao
        self.__bufs = bufs
        self.__drawMode = drawMode
        self.__indexCount = indexCount
        self.__indexType = indexType
  
    def __del__(self):
        glDeleteBuffers(len(self.__bufs), self.__bufs)
        glDeleteVertexArrays(1, [self.__vao])
  
    def draw(self):
        glBindVertexArray(self.__vao)
        glDrawElements(self.__drawMode, self.__indexCount, self.__indexType, None)


def model(filePath):
    vao = glGenVertexArrays(1)
    glBindVertexArray(vao)
    bufs = glGenBuffers(2)
    glBindBuffer(GL_ARRAY_BUFFER, bufs[0])
    glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, bufs[1])
    with open(filePath, 'rb') as fh:
        fileVersion = struct.unpack('B', fh.read(1))[0]
        if fileVersion == 0:
            return _loadMesh_v0(fh, vao, bufs)
        raise RuntimeError('Unknown mesh file version %s in %s' % (fileVersion, filePath))


# import the necessary modules
import time
from PyQt4.QtCore import *  # QTimer
from PyQt4.QtGui import *  # QApplication
from PyQt4.QtOpenGL import *  # QGLWidget
from OpenGL.GL import *  # OpenGL functionality


# this is the basic window
class OpenGLView(QGLWidget):
    def initializeGL(self):
        # set the RGBA values of the background
        glClearColor(0.1, 0.2, 0.3, 1.0)
  
        # set a timer to redraw every 1/60th of a second
        self.__timer = QTimer()
        self.__timer.timeout.connect(self.repaint)
        self.__timer.start(1000 / 60)
  
        # import a model
        self.__mesh = model(r'C:\Users\John\Python\maya\cube.bm')
  
    def resizeGL(self, width, height):
        glViewport(0, 0, width, height)
  
    def paintGL(self):
        glLoadIdentity()
        glScalef(self.height() / float(self.width()), 1.0, 1.0)
        glRotate((time.time() % 36.0) * 10, 0, 0, 1)
  
        glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT)
        self.__mesh.draw()


# this initializes Qt
app = QApplication([])
# this creates the openGL window, but it isn't initialized yet
window = OpenGLView()
# this only schedules the window to be shown on the next Qt update
window.show()
# this starts the Qt main update loop, it avoids python from continuing beyond this line
# and any Qt stuff we did above is now going to actually get executed, along with any future
# events like mouse clicks and window resizes
app.exec_()

Part 2: Creating an OpenGL friendly mesh exporter for Maya

Part 2: Creating an exporter

This is part 2 of a series and it is about getting started with visualizing triangle meshes with Python 2.7 using the libraries PyOpenGL and PyQt4.

Part 1
Part 2
Part 3

I will assume you know python, you will not need a lot of Qt or OpenGL experience, though I will also not go into the deeper details of how OpenGL works. For that I refer you to official documentation and the excellent (C++) tutorials at https://open.gl/. Although they are C++, there is a lot of explanation about OpenGL and why to do certain calls in a certain order.

On a final note: I will make generalizations and simplifications when explaining things. If you think something works different then I say it probably does, this is to try and convey ideas to beginners, not to explain low level openGL implementations.

2.1 File layout

Now that we can draw a model, it is time to define the data that we need to give OpenGL and decide upon a file format that can contain all this data.

Starting off with the elements, we have a way to draw them (GL_TRIANGLES in our case) and we have a data type (GL_UNSIGNED_INT in our case). Given this data type and the number of elements we can actually determine the buffer size regardless of the data type, allowing our file to support not only a number of element values, but allowing those values to be of all the supported types.

Similarly we can look at the vertex layout. We probably want a vertex count and a size of per-vertex-data. This size is a little more complicated because the attribute layout can be very flexible. I suppose it’s easier if we also write the vertex element size instead of trying to figure out what it should be based on the layout.

Then we can look at the attribute layout. We can assume all our data is tightly packed, so we can infer the offset (last argument of glVertexAttribPointer). That leaves us with a layout location, a number of values per vertex, and a data type. Of course first we need to write how many attributes we have.

After that all we need to do is fill in the buffer data. So for vertexCount * vertexElementSize bytes we specify binary data for the vertex buffer and for elementCount * elementDataSize we specify binary data for the elements buffer.

Our file format now looks like this:

Version nr byte So we can change the format later and not break things.
Vertex count unsigned int Because the elements_array can use at most an unsigned int we can never point to vertices beyond the maximum of this data, so no need to store more bytes.
Vertex element size byte Size in bytes of a vertex, based on all attribute sizes combined.
Element count unsigned int
Element data size byte To infer whether indices are unsigned char, unsigned short or unsigned int.
Render type GLenum OpenGL defines variables as GL_TRIANGLES as a GLenum type which is in turn just an unsigned int.
Number of attributes byte
[For each attribute]
Attribute location byte
Attribute dimensions byte Is it a single value, vec2, 3 or 4?
Attribute type GLenum Are the values float, or int, more types listed in the OpenGL documenation for glVertexAttribuPointer.
[End for]
Vertex buffer vertexCount * vertexElementSize bytes
Elements buffer elementCount * elementDataSize bytes

That brings us to the next step, gather this information in Maya.

2.2 Maya mesh exporter

To export I’ll use the maya API (OpenMaya). It provides a way to quickly iterate over a mesh’ data without allocating too much memory using the MItMeshPolygons. This will iterate over all the faces and allow us to extract the individual triangles and face vertices.

There are a few steps to do. First let’s make a script to generate a test scene:

from maya import cmds
from maya.OpenMaya import *
cmds.file(new=True, force=True)
cmds.polyCube()
meshShapeName = 'pCubeShape1'
outputFilePath = 'C:/Test.bgm'

Now with these variables in mind we have to convert the shape name to actual maya API objects that we can read data from.

# get an MDagPath from the given mesh path
p = MDagPath()
l = MSelectionList()
MGlobal.getSelectionListByName(mayaShapeName, l)
l.getDagPath(0, p)

# get the iterator
poly = MItMeshPolygon(p)

This sets us up to actually start saving data. Because openGL requires us to provide all the data of a vertex at 1 vertex index we have to remap some of Maya’s data. In Maya a vertex (actually face-vertex in Maya terms) is a list of indices that points to e.g. what vertex to use, what normal to use, etc. All with separate indices. In OpenGL all these indices must match. The way I’ll go about this is to simply take the triangulation and generate 3 unique vertices for each triangle. This means that to find the vertex count can be determined by counting the triangles in the mesh. Maya meshes don’t expose functionality to query this , so instead I’ll iterate over all the faces and count the triangles in them.

# open the file as binary
with open(outputFilePath, 'wb') as fh:
    # fixing the vertex data size to just X, Y, Z floats for the vertex position
    vertexElementSize = 3 * ctypes.sizeof(ctypes.c_float)
  
    # using unsigned integers as elements
    indexElementSize = ctypes.sizeof(ctypes.c_uint)
  
    # gather the number of vertices
    vertexCount = 0
    while not poly.isDone():
        vertices = MPointArray()
        vertexList = MIntArray()
        poly.getTriangles(vertices, vertexList, space)
        vertexCount += vertexList.length()
        poly.next()
    poly.reset()
    # start writing
    fh.write(struct.pack('B', FILE_VERSION))
    fh.write(struct.pack('I', vertexCount))
    fh.write(struct.pack('B', vertexElementSize))
    # currently I'm duplicating all vertices per triangle, so total indices matches    total vertices
    fh.write(struct.pack('I', vertexCount))
    fh.write(struct.pack('B', indexElementSize))
    fh.write(struct.pack('I', GL_TRIANGLES))  # render type

As you can see we had to make some assumptions about vertex data size and we had to gather some intel on our final vertex count, but this is a good setup. Next step is to write the attribute layout. I’ve made the assumption here to write only X Y Z position floats at location 0. We can expand the exporter later with more features, as our file format supports variable attribute layouts. We can write our position attribute next:

# attribute layout
# 1 attribute
fh.write(struct.pack('B', 1))
# at location 0
fh.write(struct.pack('B', 0))
# of 3 floats
fh.write(struct.pack('B', 3))
fh.write(struct.pack('I', GL_FLOAT))

Note that I am using a constant GL_FLOAT here, if you do not wish to install PyOpenGL for your maya, you can quite simply include this at the top of the file instead:

import ctypes
GL_TRIANGLES = 0x0004
GL_UNSIGNED_INT = 0x1405
GL_FLOAT = 0x1406

After that comes streaming the vertex buffer. For this I use the same iterator I used to count the vertex count. The code is pretty much the same only now I write the vertices instead of counting the vertex list.

# iter all faces
while not poly.isDone():
    # get triangulation of this face
    vertices = MPointArray()
    vertexList = MIntArray()
    poly.getTriangles(vertices, vertexList, space)
  
    # write the positions
    for i in xrange(vertexList.length()):
        fh.write(struct.pack('3f', vertices[i][0], vertices[i][1], vertices[i][2]))
  
    poly.next()

Last is the element buffer.

# write the elements buffer
for i in xrange(vertexCount):
   fh.write(struct.pack('I', i))

2.3 All the data

The next step naturally is to export more than just the position. Here is a more elaborate way to extract all the attributes. First we need to get some global data from the mesh. This goes right after where we create the MItMeshPolygons.

fn = MFnMesh(p)
tangents = MFloatVectorArray()
fn.getTangents(tangents, space)
colorSetNames = []
fn.getColorSetNames(colorSetNames)
uvSetNames = []
fn.getUVSetNames(uvSetNames)

Next we have to change our vertexElementSize code to the following:

# compute the vertex data size, write 4 floats for the position for more convenient transformation in shaders
# position, tangent, normal, color sets, uv sets
vertexElementSize = (4 + 3 + 3 + 4 * len(colorSetNames) + 2 * len(uvSetNames)) * ctypes.sizeof(ctypes.c_float)

The attribute layout is significantly changed. I’m also changing the point data from a vec3 to a vec4. I’m filling in the w component as 1.0, this to indicate a point instead of a vector. It will make transforming vertices in shaders a step simpler.

# attribute layout

# Since NVidia is the only driver to implement a default attribute layout I am following this as much as possible
# on other drivers using a custom shader is mandatory and modern buffers will never work with the fixed function pipeline.
# http://developer.download.nvidia.com/opengl/glsl/glsl_release_notes.pdf
# https://stackoverflow.com/questions/20573235/what-are-the-attribute-locations-for-fixed-function-pipeline-in-opengl-4-0-cor

# num attributes
fh.write(struct.pack('B', 3 + len(colorSetNames) + len(uvSetNames)))
# vec4 position at location 0
fh.write(struct.pack('B', 0))
fh.write(struct.pack('B', 4))
fh.write(struct.pack('I', GL_FLOAT))
# vec3 tangent at location 1
fh.write(struct.pack('B', 1))
fh.write(struct.pack('B', 3))
fh.write(struct.pack('I', GL_FLOAT))
# vec3 normal at location 2
fh.write(struct.pack('B', 2))
fh.write(struct.pack('B', 3))
fh.write(struct.pack('I', GL_FLOAT))
# vec4 color at locations (3,7) and 16+
used = {}
for i in xrange(len(colorSetNames)):
    idx = 3 + i
    if idx > 7:
        idx = 11 + i
        used.add(idx)
    fh.write(struct.pack('B', idx))
    fh.write(struct.pack('B', 4))
    fh.write(struct.pack('I', GL_FLOAT))
# vec2 uvs at locations 8-15 and 16+, but avoiding overlap with colors
idx = 8
for i in xrange(len(uvSetNames)):
    while idx in used:
        idx += 1
    fh.write(struct.pack('B', idx))
    fh.write(struct.pack('B', 2))
    fh.write(struct.pack('I', GL_FLOAT))
    idx += 1

Most of the MItMeshPolygon iterator functions, like getNormals(), gives us a list of the normals for all vertices in this face. The problem is that this data is not triangulated.

To extract the triangulation we used getTriangles(), which gives us a list of vertices used in the face. These vertex numbers are object-wide, so they keep getting bigger the further we get.

That means they’re useless if we want to use them to look up the normal returned by getNormals(), because that array is always very short, containing just the normals for this face.

So we have to do some mapping from the triangulated vertex indices into indices that match the data we’ve got. Either that or get all the normals from the mesh in 1 big array but that is not memory efficient. So at the top of the while loop (just inside) I’ve added the following dictionary:

# map object indices to local indices - because for some reason we can not query the triangulation as local indices
# but all getters do want us to provide local indices
objectToFaceVertexId = {}
count = poly.polygonVertexCount()
for i in xrange(count):
    objectToFaceVertexId[poly.vertexIndex(i)] = i

That allows us to extract all the data we want for these triangles like so:

# get per-vertex data
normals = MVectorArray()
poly.getNormals(normals, space)
colorSet = []
for i, colorSetName in enumerate(colorSetNames):
    colorSet.append(MColorArray())
    poly.getColors(colorSet[i], colorSetName)
uvSetU = []
uvSetV = []
for i, uvSetName in enumerate(uvSetNames):
    uvSetU.append(MFloatArray())
    uvSetV.append(MFloatArray())
    poly.getUVs(uvSetU[i], uvSetV[i], uvSetName)

Handling fairly small sets of data at a time. Last we have to write the data, replacing the loop writing 3 floats per vertex we had before with this longer loop:

# write the data
for i in xrange(vertexList.length()):
    localVertexId = objectToFaceVertexId[vertexList[i]]
    tangentId = poly.tangentIndex(localVertexId)
  
    fh.write(struct.pack('4f', vertices[i][0], vertices[i][1], vertices[i][2], 1.0))
    fh.write(struct.pack('3f', tangents[tangentId][0], tangents[tangentId][1], tangents[tangentId][2]))
    fh.write(struct.pack('3f', normals[localVertexId][0], normals[localVertexId][1], normals[localVertexId][2]))
    for j in xrange(len(colorSetNames)):
        fh.write(struct.pack('4f', colorSet[j][localVertexId][0], colorSet[j][localVertexId][1], colorSet[j][localVertexId][2], colorSet[j][localVertexId][3]))
    for j in xrange(len(uvSetNames)):
        fh.write(struct.pack('2f', uvSetU[j][localVertexId], uvSetV[j][localVertexId]))

And that completes the exporter with full functionality, extracting all possible data from a maya mesh we want. Unless you want blind data and skin clusters, but that’s a whole different story!

2.4 Code

Here is the final code as a function, with an additional function to export multiple selected meshes to multiple files, using Qt for UI. Note that if you wish to use PySide or PyQt5 instead the QFileDialog.getExistingDirectory and QSettings.value return types are different and require some work.

import os
import struct
from maya import cmds
from maya.OpenMaya import *
import ctypes

GL_TRIANGLES = 0x0004
GL_UNSIGNED_INT = 0x1405
GL_FLOAT = 0x1406
FILE_EXT = '.bm'  # binary mesh
FILE_VERSION = 0
EXPORT_SPACE = MSpace.kWorld  # export meshes in world space for now


def exportMesh(mayaShapeName, outputFilePath, space):
    # get an MDagPath from the given mesh path
    p = MDagPath()
    l = MSelectionList()
    MGlobal.getSelectionListByName(mayaShapeName, l)
    l.getDagPath(0, p)
  
    # get the mesh and iterator
    fn = MFnMesh(p)
    poly = MItMeshPolygon(p)
  
    tangents = MFloatVectorArray()
    fn.getTangents(tangents, space)
    colorSetNames = []
    fn.getColorSetNames(colorSetNames)
    uvSetNames = []
    fn.getUVSetNames(uvSetNames)
  
    # open the file as binary
    with open(outputFilePath, 'wb') as fh:
        # compute the vertex data size, write 4 floats for the position for more convenient transformation in shaders
        # position, tangent, normal, color sets, uv sets
        vertexElementSize = (4 + 3 + 3 + 4 * len(colorSetNames) + 2 * len(uvSetNames)) * ctypes.sizeof(ctypes.c_float)
  
        # using unsigned integers as elements
        indexElementSize = ctypes.sizeof(ctypes.c_uint)
  
        # gather the number of vertices
        vertexCount = 0
        while not poly.isDone():
            vertices = MPointArray()
            vertexList = MIntArray()
            poly.getTriangles(vertices, vertexList, space)
            vertexCount += vertexList.length()
            poly.next()
        poly.reset()
  
        # start writing
        fh.write(struct.pack('B', FILE_VERSION))
        fh.write(struct.pack('I', vertexCount))
        fh.write(struct.pack('B', vertexElementSize))
        # currently I'm duplicating all vertices per triangle, so total indices matches total vertices
        fh.write(struct.pack('I', vertexCount))
        fh.write(struct.pack('B', indexElementSize))
        fh.write(struct.pack('I', GL_TRIANGLES))  # render type
  
        # attribute layout
  
        # Since NVidia is the only driver to implement a default attribute layout I am following this as much as possible
        # on other drivers using a custom shader is mandatory and modern buffers will never work with the fixed function pipeline.
        # http://developer.download.nvidia.com/opengl/glsl/glsl_release_notes.pdf
        # https://stackoverflow.com/questions/20573235/what-are-the-attribute-locations-for-fixed-function-pipeline-in-opengl-4-0-cor
  
        # num attributes
        fh.write(struct.pack('B', 3 + len(colorSetNames) + len(uvSetNames)))
        # vec4 position at location 0
        fh.write(struct.pack('B', 0))
        fh.write(struct.pack('B', 4))
        fh.write(struct.pack('I', GL_FLOAT))
        # vec3 tangent at location 1
        fh.write(struct.pack('B', 1))
        fh.write(struct.pack('B', 3))
        fh.write(struct.pack('I', GL_FLOAT))
        # vec3 normal at location 2
        fh.write(struct.pack('B', 2))
        fh.write(struct.pack('B', 3))
        fh.write(struct.pack('I', GL_FLOAT))
        # vec4 color at locations (3,7) and 16+
        used = {}
        for i in xrange(len(colorSetNames)):
            idx = 3 + i
            if idx > 7:
                idx = 11 + i
                used.add(idx)
            fh.write(struct.pack('B', idx))
            fh.write(struct.pack('B', 4))
            fh.write(struct.pack('I', GL_FLOAT))
        # vec2 uvs at locations 8-15 and 16+, but avoiding overlap with colors
        idx = 8
        for i in xrange(len(uvSetNames)):
            while idx in used:
                idx += 1
            fh.write(struct.pack('B', idx))
            fh.write(struct.pack('B', 2))
            fh.write(struct.pack('I', GL_FLOAT))
            idx += 1
  
        # iter all faces
        while not poly.isDone():
            # map object indices to local indices - because for some reason we can not query the triangulation as local indices
            # but all getters do want us to provide local indices
            objectToFaceVertexId = {}
            count = poly.polygonVertexCount()
            for i in xrange(count):
                objectToFaceVertexId[poly.vertexIndex(i)] = i
  
            # get triangulation of this face
            vertices = MPointArray()
            vertexList = MIntArray()
            poly.getTriangles(vertices, vertexList, space)
  
            # get per-vertex data
            normals = MVectorArray()
            poly.getNormals(normals, space)
            colorSet = []
            for i, colorSetName in enumerate(colorSetNames):
                colorSet.append(MColorArray())
                poly.getColors(colorSet[i], colorSetName)
            uvSetU = []
            uvSetV = []
            for i, uvSetName in enumerate(uvSetNames):
                uvSetU.append(MFloatArray())
                uvSetV.append(MFloatArray())
                poly.getUVs(uvSetU[i], uvSetV[i], uvSetName)
  
            # write the data
            for i in xrange(vertexList.length()):
                localVertexId = objectToFaceVertexId[vertexList[i]]
                tangentId = poly.tangentIndex(localVertexId)
  
                fh.write(struct.pack('4f', vertices[i][0], vertices[i][1], vertices[i][2], 1.0))
                fh.write(struct.pack('3f', tangents[tangentId][0], tangents[tangentId][1], tangents[tangentId][2]))
                fh.write(struct.pack('3f', normals[localVertexId][0], normals[localVertexId][1], normals[localVertexId][2]))
                for j in xrange(len(colorSetNames)):
                    fh.write(struct.pack('4f', colorSet[j][localVertexId][0], colorSet[j][localVertexId][1], colorSet[j][localVertexId][2], colorSet[j][localVertexId][3]))
                for j in xrange(len(uvSetNames)):
                    fh.write(struct.pack('2f', uvSetU[j][localVertexId], uvSetV[j][localVertexId]))
  
            poly.next()
  
        # write the elements buffer
        for i in xrange(vertexCount):
            fh.write(struct.pack('I', i))


def exportSelected():
    selectedMeshShapes = cmds.select(ls=True, type='mesh', l=True) or []
    selectedMeshShapes += cmds.listRelatives(cmds.select(ls=True, type='transform', l=True) or [], c=True, type='mesh', f=True) or []
    from PyQt4.QtCore import QSettings
    from PyQt4.QtGui import QFileDialog
    settings = QSettings('GLMeshExport')
    mostRecentDir = str(settings.value('mostRecentDir').toPyObject())
    targetDir = QFileDialog.getExistingDirectory(None, 'Save selected meshes in directory', mostRecentDir)
    if targetDir and os.path.exists(targetDir):
        settings.setValue('mostRecentDir', targetDir)
        for i, shortName in enumerate(cmds.ls(selectedMeshShapes)):
            exportMesh(selectedMeshShapes[i],
                       os.path.join(targetDir, shortName.replace('|', '_'), FILE_EXT),
                       EXPORT_SPACE)