Attribute editor in PyQt

I’ve been working on a particle editor, though that isn’t entirely done yet, I did create something interesting in the process. An attribute editor for arbitrary python objects.

This is where I’m at right now, I hope to get to work on this more and share details about the particles themselves once it is more complete.

On the right you see an editor for the following object:

class ParticleSettings(OrderedClass):
    def __init__(self):
        super(ParticleSettings, self).__init__()
        # emitter
        self.emitterType = Enum(('Sphere', 'Cone', 'Box'), 0)
        self.emitterSettings = Vec3([0.5, 0.5, 0.5])
        self.emitterIsVolume = True
        self.randomDirection = False
        # not curve based
        self.startSize = RandomFloat()
        self.startSpeed = RandomFloat()
        self.startRotation = RandomFloat()
        self.lifeTime = RandomFloat()
        # curve based on particle alive time / life time
        self.sizeOverTime = RandomChannelFloat()
        self.angularVelocity = RandomChannelFloat()
        self.velocityOverTime = RandomChannelVec3()

I’ve added some data types so I can visualize them better, but the attribute editing framework I wrote works off the bat on python’s basic types.
I’d like to break down how I got here, as I wrote a heap of code which still needs a heap of refactoring for it to be presentable, I’ll demonstrate creating a more basic example, which should be more useful because it isn’t cluttered with my edge cases.

Preparing edit widgets

First, I made some widgets to edit some basic data types. Because we want to generate and connect our widgets to data, it’d be nice if they all have the same interface to keep the rest of our code abstract. I went for the following interface:

class AEComponent(QObject):
    # Attribute editor widget editing a single value type. Note that UI interactions from the user should emit valueChanged.
    valueChanged = pyqtSignal(object)

    def __init__(self):
        # The constructor may accept additional arguments, e.g. default value or enum options
        self._value = None

    def value(self):
        # Return the internal value
        return self._value

    def setValue(self, value):
        # Set value should programatically adjust the internal value, without emitting a signal; used in case multiple set values may trigger or when a parent widget is already going to send a change event.
        self.blockSignals(True)
        self._value = value
        self.blockSignals(False)

    def editValue(self, value):
        # Set the value and emit a change event
        self._value = value
        self.valueChanged.emit(value)

Note that this is an example of the interface, not a base class. I will not actually use the code above, I’ll just subclass Qt widgets and make them behave the same.

The core data types I want to support are:

int QSpinBox
float QDoubleSpinBox
bool checkable QPushButton
str QLineEdit
object recurse into its propreties
dict recurse into its items
list recurse into its items

Because tuples & sets are not mutable it’d be hard to construct a widget that sets the entire tuple at once.
I do not intend to adjust the composition of lists, dicts and objects – so no element insertion / removal.

Int & double
QSpinBox already has a value and setValue, but the setValue emits a signal. Instead I’m adding an editValue that forwards to the super setValue and make setValue block the signals. I’ve also made it so I can construct versions that only support e.g. ctypes.c_char by adding a number of bits parameter that is used to infer limits. It’d be trivial to extend this to unsigned and size-limited variants. The LineEditSelected used is listed below at the QLineEdit, it just simply selects all text at the focus event.

class SpinBox(QSpinBox):
    """
    QSpinBox with right limits & that follows the AEComponent interface.
    """
    def __init__(self, value=0, bits=32):
        super(SpinBox, self).__init__()
        self.setMinimum(-2 ** (bits - 1))
        self.setMaximum(2 ** (bits - 1) - 1)
        self.setValue(value)
        self.setLineEdit(LineEditSelected())

    def setValue(self, value):
        self.blockSignals(True)
        super(SpinBox, self).setValue(value)
        self.blockSignals(False)

    def editValue(self, value):
        super(SpinBox, self).setValue(value)

Doubles are almost identical.

class DoubleSpinBox(QDoubleSpinBox):
    """
    QDoubleSpinBox with right limits & that follows the AEComponent interface.
    """
    def __init__(self, value=0.0):
        super(DoubleSpinBox, self).__init__()
        self.setMinimum(-float('inf'))
        self.setMaximum(float('inf'))
        self.setValue(value)
        self.setSingleStep(0.01)  # Depending on use case this can be very coarse.
        self.setLineEdit(LineEditSelected())

    def setValue(self, value):
        self.blockSignals(True)
        super(DoubleSpinBox, self).setValue(value)
        self.blockSignals(False)

    def editValue(self, value):
        super(DoubleSpinBox, self).setValue(value)

Booleans with icons
A few more interesting things to make this work based on a checkable QPushButton.
Manual value changed signal handling & keeping track of the icon to use.

class IconBoolEdit(QPushButton):
    """
    QPushButton with icons to act as a boolean (not tri-state) toggle.
    """
    valueChanged = pyqtSignal(bool)

    def __init__(self, *args):
        super(IconBoolEdit, self).__init__(*args)
        self.__icons = icons.get('Unchecked'), icons.get('Checked')  # Implement your own way to get icons!
        self.setIcon(self.__icons[0])
        self.setCheckable(True)
        self.clicked.connect(self.__updateIcons)
        self.clicked.connect(self.__emitValueChanged)

    def setIcons(self, off, on):
        self.__icons = off, on
        self.__updateIcons(self.isChecked())

    def __updateIcons(self, state):
        self.setIcon(self.__icons[int(state)] or QIcon())

    def __emitValueChanged(self, state):
        self.valueChanged.emit(state)

    def value(self):
        return self.isChecked()

    def setValue(self, state):
        self.setChecked(state)
        self.__updateIcons(state)

    def editValue(self, state):
        self.setChecked(state)
        self.__updateIcons(state)
        self.__emitValueChanged(state)

Strings
This is very similar to the spinbox. One addition I added is to make sure clicking the line edit selects all constants so a user can start typing a new word immediately.

class LineEdit(QLineEdit):
    valueChanged = pyqtSignal(str)

    def __init__(self, *args):
        super(LineEdit, self).__init__(*args)
        self.textChanged.connect(self.valueChanged.emit)

    def value(self):
        return self.text()

    def setValue(self, text):
        self.blockSignals(True)
        self.setText(text)
        self.blockSignals(False)

    def editValue(self, text):
        self.setText(text)


class LineEditSelected(LineEdit):
    def __init__(self):
        super(LineEditSelected, self).__init__()
        self.__state = False

    def focusInEvent(self, event):
        super(LineEditSelected, self).focusInEvent(event)
        self.selectAll()
        self.__state = True

    def mousePressEvent(self, event):
        super(LineEditSelected, self).mousePressEvent(event)
        if self.__state:
            self.selectAll()
            self.__state = False

Reflecting python objects

Reflection in python is very easy, and our use case simple.
Every python object has a __dict__ attribute that contains all the current members of an object (but not methods).
In python we tend to denote protected (internal) data by prefixing variable names with an underscore.
So to find all attributes that we want to inspect we can simply do:

for name in instance.__dict__:
    if name[0] == '_':
        continue

Now to control such an attribute with a widget we need to construct the right widget and connect the change event to a setter.
In python we can use the functools module to bind the global getattr and setattr methods and get a way to connect a callback to a property assignment.

    value = getattr(instance, name)  # get the current value by name, like the dot operator but using a string to get to the right property
    cls = factory.findEditorForType(type(value))  # factory to get the right widget for our data type, more on this later
    widget = cls()  # construct the widget
    widget.setValue(getattr(instance, name))  # set the editor's initial value to match with our data
    widget.valueChanged.connect(functools.partial(setattr, instance, name))  # make the editor update our data

Widget factory

The last piece of the puzzle is a way to generate widgets based on data types. I wanted to keep this abstract, so I made a class out of it.
We can register data type & widget type relations and it understands to create a widget if we have one registered for a base class of the type we’re querying.

class AEFactory(object):
    def __init__(self):
        self.__typeWidgets = {}

    def registerType(self, dataType, widgetConstructor):
        self.__typeWidgets[dataType] = widgetConstructor

    @staticmethod
    def _allBaseTypes(cls):
        """
        Recurse all base classes and return a list of all bases with most close relatives first.
        https://stackoverflow.com/questions/1401661/list-all-base-classes-in-a-hierarchy-of-given-class
        """
        result = list(cls.__bases__)
        for base in result:
            result.extend(AEFactory._allBaseTypes(base))
        return result

    def _findEditorForType(self, dataType):
        if dataType in self.__typeWidgets:
            return self.__typeWidgets[dataType]

        for baseType in AEFactory._allBaseTypes(dataType):
            if dataType in self.__typeWidgets:
                return self.__typeWidgets[baseType]

Complex data

Now this will work fine for simple objects with simple data types. But the real fun begins when we have instances whose properties are lists of other instances.
Our findEditorForType will return None in this case and we get an error. Instead, we should split this up in several steps. First we determine the type of data we’re dealing with, to defer the widget creation to any type of recursive function until we reach simple data types for which we can generate widgets.

from collections import OrderedDict

class AEFactor(object):

... the above code still goes here ...

    def generate(self, data, parent=None, name=None):
        """
        This recursively generates widgets & returns an iterator of every resulting item.
        """
        if isinstance(data, (dict, OrderedDict)):
            generator = self._generateMap(data)
        elif hasattr(data, '__getitem__') and hasattr(data, '__setitem__'):
            generator = self._generateList(data)
        elif hasattr(data, '__dict__'):
            generator = self._generateInstance(data)
        else:
            generator = self._generateField(data, parent, name)

    def _generateField(self, data, parent, name):
        cls = self._findEditorForType(type(data))
        assert cls, 'Error: could not inspect object "%s" (parent: %s, name: %s). No wrapper registered or non-supported compound type.' % (data, parent, name)
        widget = cls()
        widget.setValue(data)
        widget.valueChanged.connect(functools.partial(setattr, parent, name))
        yield widget

    def _generateInstance(self, data):
        for name in data.__dict__:
            if name[0] == '_':
                continue
            yield QLabel(name)
            for widget in self.generate(getattr(data, name), data, name):
                yield widget

    def _generateList(self, data):
        for i in xrange(len(data)):
            yield QLabel(str(i))
            for widget in self.generate(data[i], data, str(i)):
                yield widget

    def _generateMap(self, data):
        for key in data:
            yield QLabel(str(key))
            for widget in self.generate(data[key], data, key):
                yield widget

Formatting

If we have a class that needs a special widget or layout, like my particle editor, we may wish to grab the widgets generated for that class and manipulate them.
One case I have is that I have a random channel, which has a minimum, maximum and isRandom flag. If isRandom is turned off then I just want to show the minimum field because the maximum is unused. In order to do this I extended the factory with the ability to inject functions that take groups of widgets for a certain data
type. See registerWrapper, findWrapperForType and the modifications at the end of generate.

class AEFactory(object):
    def __init__(self):
        self.__typeWidgets = {}
        self.__typeWrappers = {}

    def registerType(self, dataType, widgetConstructor):
        self.__typeWidgets[dataType] = widgetConstructor

    def registerWrapper(self, dataType, wrapperFunction):
        """
        The wrapperFunction must accept a generator of widgets & return a generator of widgets.
        """
        self.__typeWrappers[dataType] = wrapperFunction

    @staticmethod
    def _allBaseTypes(cls):
        """
        Recurse all base classes and return a list of all bases with most close relatives first.
        https://stackoverflow.com/questions/1401661/list-all-base-classes-in-a-hierarchy-of-given-class
        """
        result = list(cls.__bases__)
        for base in result:
            result.extend(AEFactory._allBaseTypes(base))
        return result

    def _findEditorForType(self, dataType):
        if dataType in self.__typeWidgets:
            return self.__typeWidgets[dataType]

        for baseType in AEFactory._allBaseTypes(dataType):
            if dataType in self.__typeWidgets:
                return self.__typeWidgets[baseType]

    def _findWrapperForType(self, dataType):
        if dataType in self.__typeWrappers:
            return self.__typeWrappers[dataType]

        for baseType in AEFactory._allBaseTypes(dataType):
            if dataType in self.__typeWrappers:
                return self.__typeWrappers[baseType]

    def generate(self, data, parent=None, name=None):
        """
        This recursively generates widgets & returns an iterator of every resulting item.
        """
        if isinstance(data, (dict, OrderedDict)):
            generator = self._generateMap(data)
        elif hasattr(data, '__getitem__') and hasattr(data, '__setitem__'):
            generator = self._generateList(data)
        elif hasattr(data, '__dict__'):
            generator = self._generateInstance(data)
        else:
            generator = self._generateField(data, parent, name)

        wrapper = self._findWrapperForType(type(data))
        if wrapper:
            generator = wrapper(generator)
        for widget in generator:
            yield widget

    def _generateField(self, data, parent, name):
        cls = self._findEditorForType(type(data))
        assert cls, 'Error: could not inspect object "%s" (parent: %s, name: %s). No wrapper registered or non-supported compound type.' % (data, parent, name)
        widget = cls()
        widget.setValue(data)
        widget.valueChanged.connect(functools.partial(setattr, parent, name))
        yield widget

    def _generateInstance(self, data):
        for name in data.__dict__:
            if name[0] == '_':
                continue
            yield QLabel(name)
            for widget in self.generate(getattr(data, name), data, name):
                yield widget

    def _generateList(self, data):
        for i in xrange(len(data)):
            yield QLabel(str(i))
            for widget in self.generate(data[i], data, str(i)):
                yield widget

    def _generateMap(self, data):
        for key in data:
            yield QLabel(str(key))
            for widget in self.generate(data[key], data, key):
                yield widget
Note: I currently allow it to work on sub classes, with the risk of that subclass having extra attributes – or a different attribute order – resulting in the widgets being jumbled & my function breaking the layout completely. I’m not sure yet how to validate that a sub class matches the base class’ member layout, so maybe I should just allow explicit overrides for a single type without inheritance support.

Constraining class member order

One thing that annoys me, and maybe you noticed already, is that python does not guarantee that dictionaries are ordered.
For this the collections.OrderedDict type exists, but when dealing with class members and the __dict__ attribute we have no control over this.

Now my solution to this is pretty shaky, and I’m definitely not proud of what I came up with, but let me share it anyway!
First I created a class that overrides __setattr__ to keep track of the order in which data is set.
Then I override __getattribute__ so that when the __dict__ attribute is requested we return a wrapper around it that behaves
like the real dict, but implements all iterators to use the ordered keys list instead.

class FakeOrderedDict(object):
    def __init__(self, realDict, order):
        self.realDict = realDict
        self.order = order

    def __getitem__(self, key):
        return self.realDict[key]

    def __setitem__(self, key, value):
        self.realDict[key] = value

    def __iter__(self):
        return iter(self.order)

    def iterkeys(self):
        return iter(self.order)

    def itervalues(self):
        for key in self.order:
            yield self.realDict[key]

    def iteritems(self):
        for key in self.order:
            yield key, self.realDict[key]


class OrderedClass(object):
    def __init__(self):
        self.__dict__['_OrderedClass__attrs'] = []

    def __getattribute__(self, key):
        result = super(OrderedClass, self).__getattribute__(key)
        if key == '__dict__':
            if '_OrderedClass__attrs' in result:
                return FakeOrderedDict(result, result['_OrderedClass__attrs'])
        return result

    def __setattr__(self, key, value):
        order = self.__dict__['_OrderedClass__attrs']
        if key not in order:
            order.append(key)
        return super(OrderedClass, self).__setattr__(key, value)

That’s all folks

Example usage:

# create test objects
class Vector(list):
    pass


class Compound(OrderedClass):  # inheriting from OrderedClass to ensure widget order
    def __init__(self):
        super(Compound, self).__init__()
        self.x = 2.0  # note how explicit floats are important now
        self.y = 5.0


class Data(OrderedClass):
    def __init__(self):
        super(Data, self).__init__()
        self.name = 'List test'
        self.value = Vector([1.0, 5, True])
        self.dict = {'A': Compound(), 'B': Compound()}


def groupHLayout(widgets):
    h = QHBoxLayout()
    m = QWidget()
    for w in widgets:
        h.addWidget(w)
    m.setLayout(h)
    yield m


# create test data
data = Data()

# create Qt application
app = QApplication([])
window = QWidget()
main = QVBoxLayout()
window.setLayout(main)

# initialize inspector
factory = AEFactory()
factory.registerType(bool, IconBoolEdit)
factory.registerType(int, SpinBox)
factory.registerType(float, DoubleSpinBox)
factory.registerType(str, LineEdit)
factory.registerWrapper(Vector, groupHLayout)

# inspect the data
for widget in factory.generate(data):
    main.addWidget(widget)

window.show()
app.exec_()

# print the data after closing the editor to show we indeed propagated the changes to the data as they happened
print data.name, data.value, data.dict['A'].x, data.dict['A'].y, data.dict, data.dict['B'].x, data.dict['B'].y

Last, have a full code dump!

from collections import OrderedDict
import functools
from PyQt4.QtCore import *
from PyQt4.QtGui import *


class SpinBox(QSpinBox):
    """
    QSpinBox with right limits & that follows the AEComponent interface.
    """

    def __init__(self, value=0, bits=32):
        super(SpinBox, self).__init__()
        self.setMinimum(-2 ** (bits - 1))
        self.setMaximum(2 ** (bits - 1) - 1)
        self.setValue(value)
        self.setLineEdit(LineEditSelected())

    def setValue(self, value):
        self.blockSignals(True)
        super(SpinBox, self).setValue(value)
        self.blockSignals(False)

    def editValue(self, value):
        super(SpinBox, self).setValue(value)


class DoubleSpinBox(QDoubleSpinBox):
    """
    QDoubleSpinBox with right limits & that follows the AEComponent interface.
    """

    def __init__(self, value=0.0):
        super(DoubleSpinBox, self).__init__()
        self.setMinimum(-float('inf'))
        self.setMaximum(float('inf'))
        self.setValue(value)
        self.setSingleStep(0.01)  # Depending on use case this can be very coarse.
        self.setLineEdit(LineEditSelected())

    def setValue(self, value):
        self.blockSignals(True)
        super(DoubleSpinBox, self).setValue(value)
        self.blockSignals(False)

    def editValue(self, value):
        super(DoubleSpinBox, self).setValue(value)


class IconBoolEdit(QPushButton):
    """
    QPushButton with icons to act as a boolean (not tri-state) toggle.
    """
    valueChanged = pyqtSignal(bool)

    def __init__(self, *args):
        super(IconBoolEdit, self).__init__(*args)
        self.__icons = None, None  # icons.get('Unchecked'), icons.get('Checked')  # Implement your own way to get icons!
        self.setIcon(self.__icons[0] or QIcon())
        self.setCheckable(True)
        self.clicked.connect(self.__updateIcons)
        self.clicked.connect(self.__emitValueChanged)

    def setIcons(self, off, on):
        self.__icons = off, on
        self.__updateIcons(self.isChecked())

    def __updateIcons(self, state):
        self.setIcon(self.__icons[int(state)] or QIcon())

    def __emitValueChanged(self, state):
        self.valueChanged.emit(state)

    def value(self):
        return self.isChecked()

    def setValue(self, state):
        self.setChecked(state)
        self.__updateIcons(state)

    def editValue(self, state):
        self.setChecked(state)
        self.__updateIcons(state)
        self.__emitValueChanged(state)


class LineEdit(QLineEdit):
    valueChanged = pyqtSignal(str)

    def __init__(self, *args):
        super(LineEdit, self).__init__(*args)
        self.textChanged.connect(self.valueChanged.emit)

    def value(self):
        return self.text()

    def setValue(self, text):
        self.blockSignals(True)
        self.setText(text)
        self.blockSignals(False)

    def editValue(self, text):
        self.setText(text)


class LineEditSelected(LineEdit):
    def __init__(self):
        super(LineEditSelected, self).__init__()
        self.__state = False

    def focusInEvent(self, event):
        super(LineEditSelected, self).focusInEvent(event)
        self.selectAll()
        self.__state = True

    def mousePressEvent(self, event):
        super(LineEditSelected, self).mousePressEvent(event)
        if self.__state:
            self.selectAll()
            self.__state = False


class AEFactory(object):
    def __init__(self):
        self.__typeWidgets = {}
        self.__typeWrappers = {}

    def registerType(self, dataType, widgetConstructor):
        self.__typeWidgets[dataType] = widgetConstructor

    def registerWrapper(self, dataType, wrapperFunction):
        """
        The wrapperFunction must accept a generator of widgets & return a generator of widgets.
        """
        self.__typeWrappers[dataType] = wrapperFunction

    @staticmethod
    def _allBaseTypes(cls):
        """
        Recurse all base classes and return a list of all bases with most close relatives first.
        https://stackoverflow.com/questions/1401661/list-all-base-classes-in-a-hierarchy-of-given-class
        """
        result = list(cls.__bases__)
        for base in result:
            result.extend(AEFactory._allBaseTypes(base))
        return result

    def _findEditorForType(self, dataType):
        if dataType in self.__typeWidgets:
            return self.__typeWidgets[dataType]

        for baseType in AEFactory._allBaseTypes(dataType):
            if dataType in self.__typeWidgets:
                return self.__typeWidgets[baseType]

    def _findWrapperForType(self, dataType):
        if dataType in self.__typeWrappers:
            return self.__typeWrappers[dataType]

        for baseType in AEFactory._allBaseTypes(dataType):
            if dataType in self.__typeWrappers:
                return self.__typeWrappers[baseType]

    def generate(self, data, parent=None, name=None):
        """
        This recursively generates widgets & returns an iterator of every resulting item.
        """
        if isinstance(data, (dict, OrderedDict)):
            generator = self._generateMap(data)
        elif hasattr(data, '__getitem__') and hasattr(data, '__setitem__'):
            generator = self._generateList(data)
        elif hasattr(data, '__dict__'):
            generator = self._generateInstance(data)
        else:
            generator = self._generateField(data, parent, name)

        wrapper = self._findWrapperForType(type(data))
        if wrapper:
            generator = wrapper(generator)
        for widget in generator:
            yield widget

    def _generateField(self, data, parent, name):
        cls = self._findEditorForType(type(data))
        assert cls, 'Error: could not inspect object "%s" (parent: %s, name: %s). No wrapper registered or non-supported compound type.' % (data, parent, name)
        widget = cls()
        widget.setValue(data)
        widget.valueChanged.connect(functools.partial(setattr, parent, name))
        yield widget

    def _generateInstance(self, data):
        for name in data.__dict__:
            if name[0] == '_':
                continue
            yield QLabel(name)
            for widget in self.generate(getattr(data, name), data, name):
                yield widget

    def _generateList(self, data):
        for i in xrange(len(data)):
            yield QLabel(str(i))
            for widget in self.generate(data[i], data, str(i)):
                yield widget

    def _generateMap(self, data):
        for key in data:
            yield QLabel(str(key))
            for widget in self.generate(data[key], data, key):
                yield widget


class FakeOrderedDict(object):
    def __init__(self, realDict, order):
        self.realDict = realDict
        self.order = order

    def __getitem__(self, key):
        return self.realDict[key]

    def __setitem__(self, key, value):
        self.realDict[key] = value

    def __iter__(self):
        return iter(self.order)

    def iterkeys(self):
        return iter(self.order)

    def itervalues(self):
        for key in self.order:
            yield self.realDict[key]

    def iteritems(self):
        for key in self.order:
            yield key, self.realDict[key]


class OrderedClass(object):
    def __init__(self):
        self.__dict__['_OrderedClass__attrs'] = []

    def __getattribute__(self, key):
        result = super(OrderedClass, self).__getattribute__(key)
        if key == '__dict__':
            if '_OrderedClass__attrs' in result:
                return FakeOrderedDict(result, result['_OrderedClass__attrs'])
        return result

    def __setattr__(self, key, value):
        order = self.__dict__['_OrderedClass__attrs']
        if key not in order:
            order.append(key)
        return super(OrderedClass, self).__setattr__(key, value)


# create test objects
class Vector(list):
    pass


class Compound(OrderedClass):  # inheriting from OrderedClass to ensure widget order
    def __init__(self):
        super(Compound, self).__init__()
        self.x = 2.0  # note how explicit floats are important now
        self.y = 5.0


class Data(OrderedClass):
    def __init__(self):
        super(Data, self).__init__()
        self.name = 'List test'
        self.value = Vector([1.0, 5, True])
        self.dict = {'A': Compound(), 'B': Compound()}


def groupHLayout(widgets):
    h = QHBoxLayout()
    m = QWidget()
    for w in widgets:
        h.addWidget(w)
    m.setLayout(h)
    yield m


# create test data
data = Data()

# create Qt application
app = QApplication([])
window = QWidget()
main = QVBoxLayout()
window.setLayout(main)

# initialize inspector
factory = AEFactory()
factory.registerType(bool, IconBoolEdit)
factory.registerType(int, SpinBox)
factory.registerType(float, DoubleSpinBox)
factory.registerType(str, LineEdit)
factory.registerWrapper(Vector, groupHLayout)

# inspect the data
for widget in factory.generate(data):
    main.addWidget(widget)

window.show()
app.exec_()

# print the data after closing the editor to show we indeed propagated the changes to the data as they happened
print data.name, data.value, data.dict['A'].x, data.dict['A'].y, data.dict, data.dict['B'].x, data.dict['B'].y

Polygons & textures creeping in my raymarcher

This week I’ve been working on additional features in my 64k toolchain. None of this is yet viable for 64k executables but it enhances the tool quite a bit.

My first step was implementing vertex shader support. A cool thing about vertex shaders in openGL is that they are responsible for outputting the vertex data, nobody said anything about requiring input. So with a function like glDrawArraysInstanced, we have full reign in the vertex shader to generate points based on gl_VertexID and gl_InstanceID.

Here I’m generating a grid of 10×10 quads, added some barycentric coordinates as per This article

#version 420

uniform mat4 uV;
uniform mat4 uVi;
uniform mat4 uP;

out vec3 bary;
out vec2 uv;

void main()
{
    vec3 local = vec3(gl_VertexID % 2, gl_VertexID / 2, 0.5) - 0.5;
    vec3 global = vec3(gl_InstanceID % 10, gl_InstanceID / 10, 4.5) - 4.5;

    uv = (local + global).xy * vec2(0.1, 0.1 * 16 / 9) + 0.5;

    bary = vec3(0);
    bary[gl_VertexID % 3] = 1.0;

    gl_Position = uP * vec4(mat3(uVi) * ((local + global - uV[3].xyz) * vec3(1,1,-1)), 1);
}

This was surprisingly easy to implement. In the tool I scan a template definition XML to figure out which shader source files to stitch together and treat as 1 fragment shader. Adding the distinction between .frag and .vert files allowed me to compile the resulting program with a different vertex shader than the default one and it was up and running quite fast.

Next came a more interesting bit, mixing my raymarching things together with this polygonal grid.
There are 2 bits to this, one is matching the projection, two is depth testing, and thus matching the depth output from the raymarcher.

To project a vertex I subtract the ray origin from the vertex and then multiply it by the inverse rotation. Apparantly that flips the Z axis so I had to fix that too. Then I multiply that with the projection matrix. the “u” prefix means uniform variable.

vec4 viewCoord = vec4(uViewInverse * ((vertex - uRayOrigin) * vec3(1,1-1)), 1)

My ray direction is based on mixing the corners of a frustum these days, I used to rotate the ray to get a fisheye effect but that doesn’t fly with regular projection matrices. My frustum calculation looks something like this (before going into the shader as a mat4):

tanFov = tan(uniforms.get('uFovBias', 0.5))
horizontalFov = (tanFov * aspectRatio)
uniforms['uFrustum'] = (-horizontalFov, -tanFov, 1.0, 0.0,
                        horizontalFov, -tanFov, 1.0, 0.0,
                        -horizontalFov, tanFov, 1.0, 0.0,
                        horizontalFov, tanFov, 1.0, 0.0)

So I can get a projection matrix from that as well. Additionally I added a uniform for the clipRange so the raymarcher near/far planes match the polygonal ones.

uniforms['uClipRange'] = (0.01, 100.0)
near, far = uniforms['uClipRange']
projection = Mat44.frustum(-xfov * near, xfov * near, -tfov * near, tfov * near, near, far)

For reference my raymarching ray looks like this:

vec4 d = mix(mix(uFrustum[0], uFrustum[1], uv.x), mix(uFrustum[2], uFrustum[3],uv.x), uv.y);
Ray ray = Ray(uV[3].xyz, normalize(d.xyz * mat3(uV)));

With this raymarching a 10x10x0.01 box matches up perfectly with the polygonal plane on top! Then the next issue is depth testing. All my render targets are now equipped with a float32 depth buffer, depth testing is enabled and before every frame I clear all depth buffers. Now I find my grid on top of my test scene because the raymarcher does not yet write the depth.

Following this nice article I learned a laot about this topic.
So to get the distance along Z I first define the world-space view axis (0,0,-1). Dotting that with the (intersection – rayOrigin), which is the same as totalDistance * raydirection, yield the right eye-space Z distance. The rest is explained in the article. It is pretty straight forward to map the Z using the clipping planes previously defined to match gl_DepthRange. I first fit between a 01 range (ndcDepth) and then fit back to gl_depthRange. One final trick is to fade to the FAR depth if we have 100% fog.

    vec3 viewForward = vec3(0.0, 0.0, -1.0) * mat3(uV);
    float eyeHitZ = hit.totalDistance * dot(ray.direction, viewForward);
    float ndcDepth = ((uClipRange.y + uClipRange.x) + (2 * uClipRange.y * uClipRange.x) / eyeHitZ) / (uClipRange.y - uClipRange.x);
    float z = ((gl_DepthRange.diff * ndcDepth) + gl_DepthRange.near + gl_DepthRange.far) / 2.0;
    gl_FragDepth = mix(z, gl_DepthRange.far, step(0.999, outColor0.w));

Now as if that wasn’t enough cool stuff, I added the option to bind an image file to a shot. Whenever a shot gets drawn it’s texture list is queried, uploaded and bound to the user defined uniform names. Uploading is cached so every texture is loaded only once, I should probably add file watchers… The cool thing here is that not only can I now texture things, I can also enter storyboards and time them before working on actual 3D scenes!

Python dependency graph

Beware, code heavy post
Node graphs are a great way to express relationships and logical flow between objects, both for code design and 3D scenes. Where a tree can only express single parent-child relationships, a node graph can show a hierarchy and then display how one object has a look-at logic to look at another object entirely.

This can be done for almost any type of data, and a dependency graph is an evaluation model that tries to evaluate as little of the graph as possible. An evaluation model is the way our code traverses the graph’s data to compute the end result.

A dependency graph works by lazily computing data and by storing the “dirty” state of each attribute (dirty meaning something has been changed since the last calculation of this attribute). I’ll try to illustrate with this animation of 2 colors being mixed. When an input changes it propagates that change by “dirtying” all downstream dependencies (red).

When a value is requested, input wires are followed and for outputs the node is computed (yellow), but only if the attribute is “dirty”, else we just use the cached state (green).

So to sum it up in a few simple rules:

To keep it simple I’ll say “if an input on a node changes all its outputs are dirty”, which means nodes should be quite granular as having inputs that only change part of the outputs will waste computation.

Also any connection going out of an attribute (whether it is an input feeding another input directly or an output feeding another input) will dirty all targets. These 2 simple rules result in recursive dirtying of all the right inputs and outputs throughout the graph.

When a value is requested its cached value is returned unless it’s “dirty“ in which case (for outputs) the node will be computed or (for inputs) the other side of the connection will be queried, which results in this rule applying recursively and the minimal graph being computed to get the right value in the end.

Coding it

Now with that theory out of the way, I’ve written some python to do exactly this. First let’s take a look at the initial data structure. A plug must understand connections, ownership, dirty-state and retain an internal value (e.g. the result of computing an output, or just a constant value for an input that is not connected to anything).

class Plug(object):
    def __init__(self, name, node, isComputed):
        self.__name = name  # really used for debugging only
        self.__node = node  # the parent node this plug belongs to
        self.__isComputed = isComputed  # make this an output-plug
        self.__isDirty = isComputed  # upstream dependencies changed? computed plug always have dependencies on creation
        self.__cache = None  # last computed value for outputs, user set value for inputs
        self.__sources = []  # plugs we depend on
        self.__targets = []  # plugs we feed into

    def clean(self):
        self.__isDirty = False

class DependNode(object):
    def __init__(self, computeFunc=None):
        self.__plugs = OrderedDict()  # list of node plugs, dict key should match plug name
        self._computeFunc = computeFunc  # logic implementation, should be a callable that accepts this DependNode instance as only argument

    def compute(self):
        # call compute function if node has logic
        if self._computeFunc is not None:
            self._computeFunc(self)

        # clean all plugs
        for plug in self.__plugs.itervalues():
            plug.clean()

    def addInput(self, name):
        self.__plugs[name] = Plug(name, self, isComputed=False)

    def addOutput(self, name):
        self.__plugs[name] = Plug(name, self, isComputed=True)

This template allows us to set up a basic node with inputs and outputs and to provide a custom function to calculate this node’s outputs.

const = DependNode()
const.addInput('value')

The next step is to allow setting a value to a specific plug. To do this, I’ll use some meta-programming to make the resulting code more readable. Implementing __getattr__ and __setattr__ on the DependNode as well as value() and setValue() on the plug to alter the Plug.__cache:

# Added to the end of the Plug class:
    def value(self):
        return self.__cache

    def setValue(self, value):
        self.__cache = value

# Added to the end of the DependNode class:
    def __getattr__(self, name):
        return self.__plugs[name].value()

    def __setattr__(self, name, value):
        if name not in ('_DependNode__plugs', '_DependNode__computeFunc'): # To enable setting attrs in __init__ we must revert to default behaviour for those atrtibute names, notice how mangled names are required for private attributes.
            return self.__plugs[name].setValue(value)
        return super(DependNode, self).__setattr__(name, value)

Now we can do this and see the internal state reflected properly:

const.value = 2.0

The next step I suppose is to create a simple “add” node which sums up its inputs. To do this we’ll have to add connections. I’ll implement this on the Plug class, where a plugs sources and targets are managed implicitly:

    def addSource(self, sourcePlug):
        if sourcePlug not in self.__sources:
            self.__sources.append(sourcePlug)
            sourcePlug.__targets.append(self)

    def removeSource(self, sourcePlug):
        if sourcePlug in self.__sources:
            self.__sources.remove(sourcePlug)
            sourcePlug.__targets.remove(self)

The next step is to start implementing some of our rules. First rule we need is: compute when we are dirty. This I do in Plug.value().

    def value(self):
        if self.__isDirty:
            # we are going to get clean, set clean beforehand so the compute function can get other plug values without recursively triggering compute again.
            self.__isDirty = False
            # compute dirty output
            if self.__isComputed:
                self.__node.compute()
            # fetch dirty input connection
            elif self.__sources:
                self.__cache = [source.value() for source in self.__sources]
            # plug is clean now
            self.clean()
        # return internal state
        return self.__cache

So now outputs are computed, inputs return their sources, all this is cached on the plug so it only computes when dirty is true.
The next step is to actually set and propagate the dirty state. So whenever we set a value, set a source, or a plug gets dirtied: all outgoing connections are dirtied. When the dirty happens to an input, the node’s outputs are dirtied. See the _dirty implementation below for the Plug class. The setValue and add- removeSource functions just get a _dirty call in the end.

# modifications to Plug
    def setValue(self, value):
        self.__cache = value
        self._dirty()

    def addSource(self, sourcePlug):
        if sourcePlug not in self.__sources:
            self.__sources.append(sourcePlug)
            sourcePlug.__targets.append(self)
            self._dirty()

    def removeSource(self, sourcePlug):
        if sourcePlug in self.__sources:
            self.__sources.remove(sourcePlug)
            sourcePlug.__targets.remove(self)
            self._dirty()

# added to Plug
    def _isComputed(self):
        return self.__isComputed

    def _dirty(self):
        if self.__isDirty:
            return
        self.__isDirty = True

        # dirty plugs that are computed based on this plug
        if not self.__isComputed:
            for plug in self.__node.iterPlugs():
                if plug._isComputed():
                    plug._dirty()

        # dirty plugs that use this plug as source
        for plug in self.__targets:
            plug._dirty()

# the DependNode class also gets this iterPlugs implementation
    def iterPlugs(self):
        return self.__plugs.itervalues()

Whew, now after all that we should be able to run this test to add 2 and 4 together using 3 nodes:

const = DependNode()
const.addInput('value')
const.value = 2.0

const2 = DependNode()
const2.addInput('value')
const2.value = 4.0


def addFunc(node):
    node.output = sum(node.inputs)


add = DependNode(addFunc)
add.addInput('inputs')
add.addOutput('output')
add.plug('inputs').addSource(const.plug('value'))
add.plug('inputs').addSource(const2.plug('value'))

print add.output

Here is a slightly more interesting example of a pointOnCurve node that computes a point on a bezier segment and feeds it to a different node:

def pointOnCurve(node):
    # interpolate a 2D bezier segment with t=node.parameter
    cvs = node.cvs2[0]
    t = node.parameter[0]
    r = [0, 0]
    for i in xrange(2):
        a, b, c = cvs[1][i] - cvs[0][i], cvs[2][i] - cvs[1][i], cvs[3][i] - cvs[2][i]
        d = b - a
        a, b, c, d = c - b - d, d + d + d, a + a + a, cvs[0][i]
        r[i] = (t * (t * (t * a + b) + c) + d)
    node.point = tuple(r)

timeNode = DependNode()
timeNode.addInput('time')
timeNode.time = 0.0

curveNode = DependNode()
curveNode.addInput('cvs')
curveNode.cvs = ((0.0, 0.0), (0.2, 0.0), (0.0, 0.2), (0.2, 0.2))

pointOnCurveNode = DependNode(pointOnCurve)
pointOnCurveNode.addInput('cvs2')
pointOnCurveNode.addInput('parameter')
pointOnCurveNode.plug('parameter').addSource(timeNode.plug('time'))
pointOnCurveNode.plug('cvs2').addSource(curveNode.plug('cvs'))
pointOnCurveNode.addOutput('point')

transform = DependNode()
transform.addInput('translate')
transform.plug('translate').addSource(pointOnCurveNode.plug('point'))

print transform.translate
timeNode.time = 0.5
print transform.translate

Now there are 2 more improvements I want to add. First whenever a plug has sources, there is no real need to cache the result. We can just directly read from the sources and save ourselves a bunch of copying. Second I want to be able to alter an input plug temporarily. Imagine an interface with a point moving over time, it may be nice to alter that point by hand to e.g. feed that back in some animation system. In this case it is important for us to set the translate of a transform without it jumping back to the source time as soon as we redraw (which would read the translate attribute).

First I edit Plug.value() to use the cache only for computed data.
Now that __cache is freed up I plan to use it for the user-override. So if a cache is available I want to use it at all times.
Next I return sources if available, else cache again which should in this case be computed.

Next in Plug._dirty() I only dirty computed plugs, and I set the cache to None if the plug as sources coming in. This will result in a small problem with Plug.setValue: it currently caches the value and then dirties the plug. That means that on input plugs the cache is set to None immediately. I just swap those 2 lines so my setValue sticks, and the user override also works.

    def value(self):
        if self.__isDirty and self.__isComputed:
            # compute dirty output
            if self.__isComputed:
                self.__node.compute()
            # plug is clean now
            self.clean()

        # override cache for input attributes to intervene with connections?
        if self.__cache is not None:
            return self.__cache

        # fetch input connection
        if self.__sources:
            return [source.value() for source in self.__sources]

        return self.__cache

    def _dirty(self):
        if self.__isComputed:
            # don't dirty again
            if self.__isDirty:
                return
            self.__isDirty = True
        if self.__sources:
            self.__cache = None

        # dirty plugs that are computed based on this plug
        if not self.__isComputed:
            for plug in self.__node.iterPlugs():
                if plug._isComputed():
                    plug._dirty()

        # dirty plugs that use this plug as source
        for plug in self.__targets:
            plug._dirty()

    def setValue(self, value):
        self._dirty()
        self.__cache = value

Using the previous example I can now see how overriding the parameter works until time is changed again making it use the source connection again:

print transform.translate
timeNode.time = 0.5
print transform.translate
pointOnCurveNode.parameter = 0.25
print transform.translate
timeNode.time = 0.5
print transform.translate

Full code:

from collections import OrderedDict


class Plug(object):
    def __init__(self, name, node, isComputed):
        self.__name = name  # really used for debugging only
        self.__node = node  # the parent node this plug belongs to
        self.__isComputed = isComputed  # make this an output-plug
        self.__isDirty = isComputed  # upstream dependencies changed?
        self.__cache = None  # last computed value for outputs, user set value for inputs
        self.__sources = []  # plugs we depend on
        self.__targets = []  # plugs we feed into

    def clean(self):
        self.__isDirty = False

    def value(self):
        if self.__isDirty and self.__isComputed:
            # compute dirty output
            if self.__isComputed:
                self.__node.compute()
            # plug is clean now
            self.clean()

        # override cache for input attributes to intervene with connections?
        if self.__cache is not None:
            return self.__cache

        # fetch input connection
        if self.__sources:
            return [source.value() for source in self.__sources]

        return self.__cache

    def setValue(self, value):
        self._dirty()
        self.__cache = value

    def addSource(self, sourcePlug):
        if sourcePlug not in self.__sources:
            self.__sources.append(sourcePlug)
            sourcePlug.__targets.append(self)
            self._dirty()

    def removeSource(self, sourcePlug):
        if sourcePlug in self.__sources:
            self.__sources.remove(sourcePlug)
            sourcePlug.__targets.remove(self)
            self._dirty()

    def _isComputed(self):
        return self.__isComputed

    def _dirty(self):
        if self.__isComputed:
            # don't dirty again
            if self.__isDirty:
                return
            self.__isDirty = True
        if self.__sources:
            self.__cache = None

        # dirty plugs that are computed based on this plug
        if not self.__isComputed:
            for plug in self.__node.iterPlugs():
                if plug._isComputed():
                    plug._dirty()

        # dirty plugs that use this plug as source
        for plug in self.__targets:
            plug._dirty()


class DependNode(object):
    def __init__(self, computeFunc=None):
        self.__plugs = OrderedDict()  # list of node plugs, dict key should match plug name
        self.__computeFunc = computeFunc  # logic implementation, should be a callable that accepts this DependNode instance as only argument

    def iterPlugs(self):
        return self.__plugs.itervalues()

    def compute(self):
        # call compute function if node has logic
        if self.__computeFunc is not None:
            self.__computeFunc(self)

        # clean all plugs
        for plug in self.__plugs.itervalues():
            plug.clean()

    def addInput(self, name):
        self.__plugs[name] = Plug(name, self, isComputed=False)

    def addOutput(self, name):
        self.__plugs[name] = Plug(name, self, isComputed=True)

    def plug(self, name):
        return self.__plugs[name]

    def __getattr__(self, name):
        return self.__plugs[name].value()

    def __setattr__(self, name, value):
        if name not in ('_DependNode__plugs', '_DependNode__computeFunc'):
            return self.__plugs[name].setValue(value)
        return super(DependNode, self).__setattr__(name, value)


def pointOnCurve(node):
    print 'compute'
    # interpolate a 2D bezier segment with t=node.parameter
    cvs = node.cvs2[0]
    t = node.parameter[0]
    r = [0, 0]
    for i in xrange(2):
        a, b, c = cvs[1][i] - cvs[0][i], cvs[2][i] - cvs[1][i], cvs[3][i] - cvs[2][i]
        d = b - a
        a, b, c, d = c - b - d, d + d + d, a + a + a, cvs[0][i]
        r[i] = (t * (t * (t * a + b) + c) + d)
    node.point = tuple(r)

Now I’ve also been working on sort of an entity-component based system on top of this graph, based on the PyOpenGL mesh renderer from a few posts back.

It allows me to create a scene graph based on Transforms and exported logic (using DependNodes like above) but also logical relationships which are not necessarily customizable, e.g. when rendering a camera I can query it’s entity, whose transform will provide the view matrix, without connecting these things. This creates a more classical (and less customizable) framework for render logic, but gives full control over the scene logic behind it through the dependency graph.

Using the point on curve example above and feeding it to a cube’s transform – while also implementing a timeline UI & 3D gizmo drawing – this fun thing came out:

Now the next steps would probably involve adding shader support, file watchers and creating a more elaborate Maya exporter that supports various Maya nodes (Maya is also dependency graph based so it fits very well with this!) but I’m not sure if I’m going to keep working on this.

Improving a renderer

This feeds into my previous write up on the tools developed for our 64kb endeavours.

After creating Eidolon [Video] we were left with the feeling that the rendering can be a lot better. We had this single pass bloom and simple lambert & phong shading, no anti aliasing and very poor performing depth of field. Last the performance hit for reflections was through the roof as well.

I started almost immediately with a bunch of improvements, most of this work was done within a month after Revision. Which shows in our newest demo Yermom [Video]. I’ll go over the improvements in chronological order and credit any sources used (of which there were a lot), if I managed to document that right…

Something useful to mention, all my buffers are Float32 RGBA.

Low-resolution reflections:

Basically the scene is raymarched, for every pixel there is a TraceAndShade call to render the pixel excluding fog and reflection.
From the result we do another TraceAndShade for the reflection. This makes the entire thing twice as slow when reflections are on.
Instead I early out at this point if:
if(reflectivity == 0 || gl_FragCoord.x % 4 != 0 || gl_FragCoord.y % 4 != 0) return;
That results in only 1 in 16 pixels being reflective. So instead of compositing the reflection directly I write it to a separate buffer.
Then in a future pass I composite the 2 buffers, where I just do a look up in the reflection buffer like so:
texelFetch(uImages[0], ivec2(gl_FragCoord.xy)) + texelFetch(uImages[1], ivec2(gl_FragCoord.xy / 4) * 4)
In my real scenario I removed that * 4 and render to a 4 times smaller buffer instead, so reading it back results in free interpolation.
I still have glitches when blurring the reflections too much & around edges in general. Definitely still room for future improvement.

Oren Nayar diffuse light response

The original paper and this image especially convinced me into liking this shading model for diffuse objects.

So I tried to implement that, failed a few times, got pretty close, found an accurate implementation, realized it was slow, and ended on these 2 websites:
http://www.popekim.com/2011/11/optimized-oren-nayar-approximation.html
http://www.artisticexperiments.com/cg-shaders/cg-shaders-oren-nayar-fast

That lists a nifty trick to fake it, I took away some terms as I realized they contributed barely any visible difference, so I got something even less accurate. I already want to revisit this, but it’s one of the improvements I wanted to share nonetheless.

float orenNayarDiffuse(float satNdotV, float satNdotL, float roughness)
{
    float lambert = satNdotL;
    if(roughness == 0.0)
        return lambert;
    float softRim = saturate(1.0 - satNdotV * 0.5);

    // my magic numbers
    float fakey = pow(lambert * softRim, 0.85);
    return mix(lambert, fakey * 0.85, roughness);
}

GGX specular

There are various open source implementations of this. I found one here:
http://filmicworlds.com/blog/optimizing-ggx-shaders-with-dotlh/
It talks about tricks to optimize things by precomputing a lookup texture, I didn’t go that far. There’s not much I can say about this, as I don’t fully understand the math and how it changes from the basic phong dot(N, H).

float G1V(float dotNV, float k){return 1.0 / (dotNV * (1.0 - k)+k);}

float ggxSpecular(float NdotV, float NdotL, vec3 N, vec3 L, vec3 V, float roughness)
{
    float F0 = 0.5;

    vec3 H = normalize(V + L);
    float NdotH = saturate(dot(N, H));
    float LdotH = saturate(dot(L, H));
    float a2 = roughness * roughness;

    float D = a2 / (PI * sqr(sqr(NdotH) * (a2 - 1.0) + 1.0));
    float F = F0 + (1.0 - F0) * pow(1.0 - LdotH, 5.0);
    float vis = G1V(NdotL, a2 * 0.5) * G1V(NdotV, a2 * 0.5);
    return NdotL * D * F * vis;
}

FXAA

FXAA3 to be precise. There whitepaper is quite clear, still why bother writing it if it’s open source. I can’t remember which one I used, but here’s a few links:
https://gist.github.com/kosua20/0c506b81b3812ac900048059d2383126
https://github.com/urho3d/Urho3D/blob/master/bin/CoreData/Shaders/GLSL/FXAA3.glsl
https://github.com/vispy/experimental/blob/master/fsaa/fxaa.glsl
Preprocessed and minified for preset 12 made it very small in a compressed executable. Figured I’d just share it.

#version 420
uniform vec3 uTimeResolution;uniform sampler2D uImages[1];out vec4 z;float aa(vec3 a){vec3 b=vec3(.299,.587,.114);return dot(a,b);}
#define bb(a)texture(uImages[0],a)
#define cc(a)aa(texture(uImages[0],a).rgb)
#define dd(a,b)aa(texture(uImages[0],a+(b*c)).rgb)
void main(){vec2 a=gl_FragCoord.xy/uTimeResolution.yz,c=1/uTimeResolution.yz;vec4 b=bb(a);b.y=aa(b.rgb);float d=dd(a,vec2(0,1)),e=dd(a,vec2(1,0)),f=dd(a,vec2(0,-1)),g=dd(a,vec2(-1,0)),h=max(max(f,g),max(e,max(d,b.y))),i=h-min(min(f,g),min(e,min(d,b.y)));if(i<max(.0833,h*.166)){z=bb(a);return;}h=dd(a,vec2(-1,-1));float j=dd(a,vec2( 1,1)),k=dd(a,vec2( 1,-1)),l=dd(a,vec2(-1,1)),m=f+d,n=g+e,o=k+j,p=h+l,q=c.x;
bool r=abs((-2*g)+p)+(abs((-2*b.y)+m)*2)+abs((-2*e)+o)>=abs((-2*d)+l+j)+(abs((-2*b.y)+n)*2)+abs((-2*f)+h+k);if(!r){f=g;d=e;}else q=c.y;h=f-b.y,e=d-b.y,f=f+b.y,d=d+b.y,g=max(abs(h),abs(e));i=clamp((abs((((m+n)*2+p+o)*(1./12))-b.y)/i),0,1);if(abs(e)<abs(h))q=-q;else f=d;vec2 s=a,t=vec2(!r?0:c.x,r?0:c.y);if(!r)s.x+=q*.5;else s.y+=q*.5;
vec2 u=vec2(s.x-t.x,s.y-t.y);s=vec2(s.x+t.x,s.y+t.y);j=((-2)*i)+3;d=cc(u);e=i*i;h=cc(s);g*=.25;i=b.y-f*.5;j=j*e;d-=f*.5;h-=f*.5;bool v,w,x,y=i<0;
#define ee(Q) v=abs(d)>=g;w=abs(h)>=g;if(!v)u.x-=t.x*Q;if(!v)u.y-=t.y*Q;x=(!v)||(!w);if(!w)s.x+=t.x*Q;if(!w)s.y+=t.y*Q;
#define ff if(!v)d=cc(u.xy);if(!w)h=cc(s.xy);if(!v)d=d-f*.5;if(!w)h=h-f*.5;
ee(1.5)if(x){ff ee(2.)if(x){ff ee(4.)if(x){ff ee(12.)}}}e=a.x-u.x;f=s.x-a.x;if(!r){e=a.y-u.y;f=s.y-a.y;}q*=max((e<f?(d<0)!=y:(h<0)!=y)?(min(e,f)*(-1/(f+e)))+.5:0,j*j*.75);if(!r)a.x+=q;else a.y+=q;z=bb(a);}

Multi pass bloom

The idea for this one was heavily inspired by this asset for Unity:

https://www.assetstore.unity3d.com/en/#!/content/17324

I’m quite sure the technique is not original, but that’s where I got the idea.

The idea is to downsample and blur at many resolutions and them combine the (weighted) results to get a very high quality full screen blur.
So basically downsample to a quarter (factor 2) of the screen using this shader:

#version 420

uniform vec3 uTimeResolution;
#define uTime (uTimeResolution.x)
#define uResolution (uTimeResolution.yz)

uniform sampler2D uImages[1];

out vec4 outColor0;

void main()
{
    outColor0 = 0.25 * (texture(uImages[0], (gl_FragCoord.xy + vec2(-0.5)) / uResolution)
    + texture(uImages[0], (gl_FragCoord.xy + vec2(0.5, -0.5)) / uResolution)
    + texture(uImages[0], (gl_FragCoord.xy + vec2(0.5, 0.5)) / uResolution)
    + texture(uImages[0], (gl_FragCoord.xy + vec2(-0.5, 0.5)) / uResolution));
}

Then downsample that, and recurse until we have a factor 64

All the downsamples fit in the backbuffer, so in theory that together with the first blur pass can be done in 1 go using the backbuffer as sampler2D as well. But to avoid the hassle of figuring out the correct (clamped!) uv coordinates I just use a ton of passes.

Then take all these downsampled buffers and ping pong them for blur passes, so for each buffer:
HBLUR taking steps of 2 pixels, into a buffer of the same size
VBLUR, back into the initial downsampled buffer
HBLUR taking steps of 3 pixels, reuse the HBLUR buffer
VBLUR, reuse the initial downsampled buffer

The pixel steps is given to uBlurSize, the direction of blur is given to uDirection.

#version 420

out vec4 color;

uniform vec3 uTimeResolution;
#define uTime (uTimeResolution.x)
#define uResolution (uTimeResolution.yz)

uniform sampler2D uImages[1];
uniform vec2 uDirection;
uniform float uBlurSize;

const float curve[7] = { 0.0205,
    0.0855,
    0.232,
    0.324,
    0.232,
    0.0855,
    0.0205 };

void main()
{
    vec2 uv = gl_FragCoord.xy / uResolution;
    vec2 netFilterWidth = uDirection / uResolution * uBlurSize;
    vec2 coords = uv - netFilterWidth * 3.0;

    color = vec4(0);
    for( int l = 0; l < 7; l++ )
    {
        vec4 tap = texture(uImages[0], coords);
        color += tap * curve[l];
        coords += netFilterWidth;
    }
}

Last we combine passes with lens dirt. uImages[0] is the original backbuffer, 1-6 is all the downsampled and blurred buffers, 7 is a lens dirt image.
My lens dirt texture is pretty poor, its just a precalced texture with randomly scaled and colored circles and hexagons, sometimes filled and sometimes outlines.
I don’t think I actually ever used the lens dirt or bloom intensity as uniforms.

#version 420

out vec4 color;

uniform vec3 uTimeResolution;
#define uTime (uTimeResolution.x)
#define uResolution (uTimeResolution.yz)

uniform sampler2D uImages[8];
uniform float uBloom = 0.04;
uniform float uLensDirtIntensity = 0.3;

void main()
{
    vec2 coord = gl_FragCoord.xy / uResolution;
    color = texture(uImages[0], coord);

    vec3 b0 = texture(uImages[1], coord).xyz;
    vec3 b1 = texture(uImages[2], coord).xyz * 0.6; // dampen to have less banding in gamma space
    vec3 b2 = texture(uImages[3], coord).xyz * 0.3; // dampen to have less banding in gamma space
    vec3 b3 = texture(uImages[4], coord).xyz;
    vec3 b4 = texture(uImages[5], coord).xyz;
    vec3 b5 = texture(uImages[6], coord).xyz;

    vec3 bloom = b0 * 0.5
        + b1 * 0.6
        + b2 * 0.6
        + b3 * 0.45
        + b4 * 0.35
        + b5 * 0.23;

    bloom /= 2.2;
    color.xyz = mix(color.xyz, bloom.xyz, uBloom);

    vec3 lens = texture(uImages[7], coord).xyz;
    vec3 lensBloom = b0 + b1 * 0.8 + b2 * 0.6 + b3 * 0.45 + b4 * 0.35 + b5 * 0.23;
    lensBloom /= 3.2;
    color.xyz = mix(color.xyz, lensBloom, (clamp(lens * uLensDirtIntensity, 0.0, 1.0)));
    
    color.xyz = pow(color.xyz, vec3(1.0 / 2.2));
}

White lines on a cube, brightness of 10.

White lines on a cube, brightness of 300.

Sphere tracing algorithm

Instead of a rather naive sphere tracing loop I used in a lot of 4kb productions and can just write by heart I went for this paper:
http://erleuchtet.org/~cupe/permanent/enhanced_sphere_tracing.pdf
It is a clever technique that involves overstepping and backgracking only when necessary, as well as keeping track of pixel size in 3D to realize when there is no need to compute more detail. The paper is full of code snippets and clear infographics, I don’t think I’d be capable to explain it any clearer.

Beauty shots

Depth of field

I initially only knew how to do good circular DoF, until this one came along: https://www.shadertoy.com/view/4tK3WK
Which I used initially, but to get it to look good was really expensive, because it is all single pass. Then I looked into a 3-blur-pass solution, which sorta worked, but when I went looking for more optimized versions I found this 2 pass one: https://www.shadertoy.com/view/Xd3GDl. It works extremely well, the only edge cases I found were when unfocusing a regular grid of bright points.

Here’s what I wrote to get it to work with a depth buffer (depth based blur):

const int NUM_SAMPLES = 16;

void main()
{
    vec2 fragCoord = gl_FragCoord.xy;

    const vec2 blurdir = vec2( 0.0, 1.0 );
    vec2 blurvec = (blurdir) / uResolution;
    vec2 uv = fragCoord / uResolution.xy;

    float z = texture(uImages[0], uv).w;
    fragColor = vec4(depthDirectionalBlur(z, CoC(z), uv, blurvec, NUM_SAMPLES), z);
}

Second pass:

const int NUM_SAMPLES = 16;

void main()
{
    vec2 uv = gl_FragCoord.xy / uResolution;

    float z = texture(uImages[0], uv).w;

    vec2 blurdir = vec2(1.0, 0.577350269189626);
    vec2 blurvec = normalize(blurdir) / uResolution;
    vec3 color0 = depthDirectionalBlur(z, CoC(z), uv, blurvec, NUM_SAMPLES);

    blurdir = vec2(-1.0, 0.577350269189626);
    blurvec = normalize(blurdir) / uResolution;
    vec3 color1 = depthDirectionalBlur(z, CoC(z), uv, blurvec, NUM_SAMPLES);

    vec3 color = min(color0, color1);
    fragColor = vec4(color, 1.0);
}

Shared header:

#version 420

// default uniforms
uniform vec3 uTimeResolution;
#define uTime (uTimeResolution.x)
#define uResolution (uTimeResolution.yz)

uniform sampler2D uImages[1];

uniform float uSharpDist = 15; // distance from camera that is 100% sharp
uniform float uSharpRange = 0; // distance from the sharp center that remains sharp
uniform float uBlurFalloff = 1000; // distance from the edge of the sharp range it takes to become 100% blurry
uniform float uMaxBlur = 16; // radius of the blur in pixels at 100% blur

float CoC(float z)
{
    return uMaxBlur * min(1, max(0, abs(z - uSharpDist) - uSharpRange) / uBlurFalloff);
}

out vec4 fragColor;

//note: uniform pdf rand [0;1)
float hash1(vec2 p)
{
    p = fract(p * vec2(5.3987, 5.4421));
    p += dot(p.yx, p.xy + vec2(21.5351, 14.3137));
    return fract(p.x * p.y * 95.4307);
}

#define USE_RANDOM

vec3 depthDirectionalBlur(float z, float coc, vec2 uv, vec2 blurvec, int numSamples)
{
    // z: z at UV
    // coc: blur radius at UV
    // uv: initial coordinate
    // blurvec: smudge direction
    // numSamples: blur taps
    vec3 sumcol = vec3(0.0);

    for (int i = 0; i < numSamples; ++i)
    {
        float r =
            #ifdef USE_RANDOM
            (i + hash1(uv + float(i + uTime)) - 0.5)
            #else
            i
            #endif
            / float(numSamples - 1) - 0.5;
        vec2 p = uv + r * coc * blurvec;
        vec4 smpl = texture(uImages[0], p);
        if(smpl.w < z) // if sample is closer consider it's CoC
        {
            p = uv + r * min(coc, CoC(smpl.w)) * blurvec;
            p = uv + r * CoC(smpl.w) * blurvec;
            smpl = texture(uImages[0], p);
        }
        sumcol += smpl.xyz;
    }

    sumcol /= float(numSamples);
    sumcol = max(sumcol, 0.0);

    return sumcol;
}

Additional sources used for a longer time

Distance function library

http://mercury.sexy/hg_sdf/
A very cool site explaining all kinds of things you can do with this code. I think many of these functions were invented already, but with some bonusses as ewll as a very clear code style and excellent documentations for full accessibility.
For an introduction to this library:
https://www.youtube.com/watch?v=T-9R0zAwL7s

Noise functions

https://www.shadertoy.com/view/4djSRW
Hashes optimized to only implement hash4() and the rest is just swizzling and redirecting, so a float based hash is just:

float hash1(float x){return hash4(vec4(x)).x;}
vec2 hash2(float x){return hash4(vec4(x)).xy;}

And so on.

Value noise
https://www.shadertoy.com/view/4sfGzS
https://www.shadertoy.com/view/lsf3WH

Voronoi 2D
https://www.shadertoy.com/view/llG3zy
Voronoi is great, as using the center distance we get worley noise instead, and we can track cell indices for randomization.
This is fairly fast, but still too slow to do realtime. So I implemented tileable 2D & 3D versions.

Perlin
Layering the value noise for N iterations, scaling the UV by 2 and weight by 0.5 in every iteration.
These could be controllable parameters for various different looks. A slower weight decrease results in a more wood-grain look for example.

float perlin(vec2 p, int iterations)
{
    float f = 0.0;
    float amplitude = 1.0;

    for (int i = 0; i < iterations; ++i)
    {
        f += snoise(p) * amplitude;
        amplitude *= 0.5;
        p *= 2.0;
    }

    return f * 0.5;
}

Now the perlin logic can be applied to worley noise (voronoi center) to get billows. I did the same for the voronoi edges, all tileable in 2D and 3D for texture precalc. Here’s an example. Basically the modulo in the snoise function is the only thing necessary to make things tileable. Perlin then just uses that and keeps track of the scale for that layer.

float snoise_tiled(vec2 p, float scale)
{
    p *= scale;
    vec2 c = floor(p);
    vec2 f = p - c;
    f = f * f * (3.0 - 2.0 * f);
    return mix(mix(hash1(mod(c + vec2(0.0, 0.0), scale) + 10.0),
    hash1(mod(c + vec2(1.0, 0.0), scale) + 10.0), f.x),
    mix(hash1(mod(c + vec2(0.0, 1.0), scale) + 10.0),
    hash1(mod(c + vec2(1.0, 1.0), scale) + 10.0), f.x), f.y);
}
float perlin_tiled(vec2 p, float scale, int iterations)
{
    float f = 0.0;
    p = mod(p, scale);
    float amplitude = 1.0;
    
    for (int i = 0; i < iterations; ++i)
    {
        f += snoise_tiled(p, scale) * amplitude;
        amplitude *= 0.5;
        scale *= 2.0;
    }

    return f * 0.5;
}

Creating a tool to make a 64k demo

In the process of picking up this webpage again, I can talk about something we did quite a while ago. I, together with a team, went through the process of making a 64 kilobyte demo. We happened to win at one of the biggest demoscene events in europe. Revision 2017. I still feel the afterglow of happiness from that.

If you’re not sure what that is, read on, else, scroll down! You program a piece of software that is only 64 kb in size, that shows an audio-visual experience generated in realtime. To stay within such size limits you have to generate everything, we chose to go for a rendering technique called ray marching, that allowed us to put all 3D modeling, texture generation, lighting, etc. as ascii (glsl sources) in the executable. On top of that we used a very minimal (yet versatile) modular synthesizer called 64klang2. Internally it stores a kind of minimal midi data and the patches and it can render amazing audio in realtime, so it doesn’t need to pre-render the song or anything. All this elementary and small size data and code compiles to something over 200kb, which is then compressed using an executable packer like UPX or kkrunchy

It was called Eidolon. You can watch a video:
https://youtu.be/rsZHBJdaz-Y
Or stress test your GPU / leave a comment here:
http://www.pouet.net/prod.php?which=69669

The technologies used were fairly basic, it’s very old school phong & lambert shading, 2 blur passes for bloom, so all in all pretty low tech and not worth discussing. What I would like to discuss is the evolution of the tool. I’ll keep it high level this time though. Maybe in the future I can talk about specific implementations of things, but just seeing the UI will probably explain a lot of the features and the way things work.

Step 1: Don’t make a tool from scratch

Our initial idea was to leverage existing software. One of our team members, who controlled the team besides modelling and eventually directing the whole creative result, had some experience with a real-time node based software called Touch Designer. It is a tool where you can do realtime visuals, and it supports exactly what we need: rendering into a 2D texture with a fragment shader.

We wanted to have the same rendering code for all scenes, and just fill in the modeling and material code that is unique per scene. We figured out how to concatenate separate pieces of text and draw them into a buffer. Multiple buffers even. At some point i packed all code and rendering logic of a pass into 1 grouped node and we could design our render pipeline entirely node based.

Here you see the text snippets (1) merged into some buffers (2) and then post processed for the bloom (3). On the right (4) you see the first problem we hit with Touch Designer. The compiler error log is drawn inside this node. There is basically no easy way to have that error visible in the main application somewhere. So the first iteration of the renderer (and coincidentally the main character of Eidolon) looked something like this:

The renderer didn’t really change after this.

In case I sound too negative about touch designer in the next few paragraphs, our use case was rather special, so take this with a grain of salt!

We have a timeline control, borrowed the UI design from Maya a little, so this became the main preview window. That’s when we hit some problems though. The software has no concept of window focus, so it’d constantly suffer hanging keys or responding to keys while typing in the text editor.

Last issue that really killed it though: everything has to be in 1 binary file. There is no native way to reference external text files for the shader code, or merge node graphs. There is a really weird utility that expands the binary to ascii, but then literally every single node is a text file so it is just unmergeable.

Step 2: Make a tool

So then this happened:

Over a week’s time in the evenings and then 1 long saturday I whipped this up using PyQt and PyOpenGL. This is the first screenshot I made, the curve editor isn’t actually an editor yet and there is no concept of camera shots (we use this to get hard cuts).

It has all the same concepts however, separate text files for the shader code, with an XML file determining what render passes use what files and in what buffer they render / what buffers they reference in turn. With the added advantage of the perfect granularity all stored in ascii files.

Some files are template-level, some were scene-level, so creating a new scene actually only copies the scene-level fies which can them be adjusted in a text editor, with a file watcher updating the picture. The CurveEditor feeds right back into the uniforms of the shader (by name) and the time slider at the bottom is the same idea as Maya / what you saw before.

Step 3: Make it better

Render pipeline
The concept was to set up a master render pipeline into which scenes would inject snippets of code. On disk this became a bunch of snippets, and an XML based template definition. This would be the most basic XML file:

<template>
    <pass buffer="0" outputs="1">
        <global path="header.glsl"/>
        <section path="scene.glsl"/>
        <global path="pass.glsl"/>
    </pass>
    <pass input0="0">
        <global path="present.glsl"/>
    </pass>
</template>

This will concatenated 3 files to 1 fragment shader, render into full-screen buffer “0” and then use present.glsl as another fragment shader, which in turn has the previous buffer “0” as input (forwarded to a sampler2D uniform).

This branched out into making static bufffers (textures), setting buffer sizes (smaller textures), multiple target buffers (render main and reflection pass at once), set buffer size to a portion of the screen (downsampling for bloom), 3D texture support (volumetric noise textures for cloud).

Creating a new scene will just copy “scene.glsl” from the template to a new folder, there you can then fill out the necessary function(s) to get a unique scene. Here’s an example from our latest Evoke demo. 6 scenes, under which you see the “section” files for each scene.

Camera control
The second important thing I wanted to tackle was camera control. Basically the demo will control the camera based on some animation data, but it is nice to fly around freely and even use the current camera position as animation keyframe. So this was just using Qt’s event system to hook up the mouse and keyboard to the viewport.

I also created a little widget that displays where the camera is, has an “animation input or user input” toggle as well as a “snap to current animation frame” button.

Animation control
So now to animate the camera, without hard coding values! Or even typing numbers, preferably. I know a lot of people use a tracker-like tool called Rocket, I never used it and it looks an odd way to control animation data to me. I come from a 3D background, so I figured I’d just want a curve editor like e.g. Maya has. In Touch Designer we also had a basic curve editor, conveniently you can name a channel the same as a uniform, then just have code evaluate the curve at the current time and send the result to that uniform location.
Some trickery was necessary to pack vec3s, I just look for channels that start with the same name and then end in .x, .y, .z, and possibly .w.

Here’s an excerpt from a long camera shot with lots of movement, showing off our cool hermite splines. At the top right you can see we have several built in tangent modes, we never got around to building custom tangent editing. In the end this is more than enough however. With flat tangents we can create easing/acceleration, with spline tangents we can get continuous paths and with linear tangents we get continuous speed. Next to that are 2 cool buttons that allow us to feed the camera position to another uniform, so you can literally fly to a place where you want to put an object. It’s not as good as actual move/rotate widgets but for the limited times we need to place 3D objects it’s great.

Hard cuts
Apart from being impossible to represent in this interface, we don’t support 2 keys at identical times. This means that we can’t really have the camera “jump” to a new position instantly. With a tiny amount of curve inbetween the previous and the next shot position, the time cursor can actually render 1 frame of a random camera position. So we had to solve this. I think it is one of the only big features that you won’t see in the initial screenshot above actually.

Introducing camera shots. A shot has its own “scene it should display” and its own set of animation data. So selecting a different shot yields different curve editor content. Shots are placed on a shared timeline, so scrolling through time will automatically show the right shot and setting a keyframe will automatically figure out the “shot local time” to put the key based on the global demo time. The curve editor has it’s own playhead that is directly linked to the global timeline as well so we can adjust the time in multiple places.

When working with lots of people we had issues with people touching other people’s (work in progress) shots. Therefore we introduced “disabling” of shots. This way anyone could just prefix their shots and disable them before submitting, and we could mix and match shots from several people to get a final camera flow we all liked.

Shots are also rendered on the timeline as colored blocks. The grey block underneath those is our “range slider”. It makes the top part apply on only a subsection of the demo, so it is easy to loop a specific time range, or just zoom in far enough to use the mouse to change the time granularly enough.

The devil is in the details
Some things I overlooked in the first implementation, and some useful things I added only recently.
1. Undo/Redo of animation changes. Not unimportant, and luckily not hard to add with Qt.
2. Ctrl click timeline to immediately start animating that shot
3. Right click a shot to find the scene
4. Right click a scene to create a shot for that scene in particular
5. Current time display in minutes:seconds instead of just beats
6. BPM stored per-project instead of globally
7. Lots of hotkeys!

These things make the tool just that much faster to use.

Finally, here’s our tool today. There’s still plenty to be done, but we made 2 demos with it so far and it gets better every time!