binder data-structures

by Vladimir Zlatanov (vlado at dikini net)
starter

In the previous write ups I was discussing the ideas behind how binder is supposed to work. It is still a work in progress, so new ideas come as new problems surface. In this piece I will try to sum up how I am looking at the VM's structures, which describe the actual language. Actually the VM is capable to do more than I am planning to implement, since I want to keep the possibility of extending the language.

going loopy

If you've read carefully the state machines and metadata, you could have noticed that nothing stops you from defining a field, which is a wrapper for a node. All that is fine. Really good, but the creepy monster from the bottomless recursion abyss pops out. Nothing restricts the definition of recursive chains of wrapped nodes. As a kid I've learnt that all monsters are good at heart, we just can't stand their smell, or some other feature. This particular one is having the habit of draining your resources.

Let's try and understand the problem better. A node N can be for example,

eq 1
N = A + B + (C+(D+F+E)+ A), [A:E] are fields
M = A + B + C ; C = {M} + D, {} is a wrap field
What do these reveal? Equation 1 shows the scenario, where a compound field adds a second instance of the field A. Equation 2 shows an bottom-less recursive composition - node N includes itself. A mini Cantor's paradox. But the computer is not a philosopher, so some solution is needed. We need somehow to resolve the multiple instances issue, and the related recursion.

Let's take equation 1. If we assume order of fields in not important we can simplify equation 1 like

eq 2
(1) N = A + B + (C+(D+F+E)+ A)
(2) N = A + B + C + D + F + E + A
(3) N = A + B + C + D + F + E

(1) M = A + B + ({M} + D)
(2) M = A + B + {M} + D
(3) M = A + B + D
OK, can be summarised as, flattening the structures can help in detecting and eliminating recursive and repetitive structures and behaviours. This should be valid for the presented earlier state machine model.

going messy

Flattening the structure, leads to a very simple algorithm to prevent infinite loops - check if the field or the node already exists in the state and if yes, then backtrack and continue with your business.

binder virtual machine storage
all arrows mean references, i.e. different cells connected by arrows share the memory, or part of it.

What are the consequences for the VM's memory? We need the node's state - the collection of fields, together with the associated actions triggered by signals and protected by guards. There is already a need to maintain a list of nodes due to list views. In order to check for uniqueness, a list of fields is required. The memory overhead should not be too big, having in mind the typical requirements of a node, the fact that any single object is stored only once, that is the different structures hold references not copies to objects, the easily implementable incremental loading of objects - 'keep/load only what you need'.

In the diagramme you can see two other 'global' hashes - actions and guards. Why not keep the flexibility to assign an action to any field and a guard to any action, if it does not penalising the performance. These hashes (associative arrays) are optional, and can be enabled on a 'as required' basis. I hope the diagramme is self explanatory, albeit messy.

going chatty

Let's look at the language. It has three facets. The first one are the signals received, view, update, insert, ... The second are the 'meta' operations - add, add new, remove, rename a field. The third important component are the fields. There is a possible fourth language component, which I would consider syntactic sugar - node templates. They are nearly equivalent to node types, the only difference is that there won't be any 'type' operations. A template is a predefined signature (what are the types of fields, and how are they labelled) for a collection of fields. Is it critical? It depends. What is a (node) type? It is a signature and a label for everything with that signature. The proposed templates should cover the first point. And the label? Why not use labelling by taxonomy for the cases where we care, for the type of a node?