RSS Feed

Peter Goodman's blog about PHP, Parsing Theory, C++, Functional Programming, Applications,

Nested Sets (Modified Pre-Order Tree Traversal Algorithm)

Many people stay away from nested sets. The whole idea of a node having a left and right identifier to determine nesting confuses people. That or people dislike it because they feel that it de-normalizes their database. In this article I am not going to go into how to implement the mptt algorithm; you can find my own full implementation of mptt in my code folder. Instead, I am going to show how to determine the depth of a node at runtime (in PHP) with only the knowledge of the left and right ids of a node. A quick note: my solution is an iterator and uses neither recursion nor lookbacks or lookaheads.

I would expect that this would be the usual solution to the problem. Most people tend to put the heavy lifting in SQL by doing a COUNT of every node with a left id that is less than the current one and a right id that is greater than the current one (imagine the roots coming off of the trunk of a tree, this starts at the base of a root and looks up to the trunk). That is a sufficient solution; however, it is expensive for a large tree. The solution itself also requires that the depth be calculated for every node in advance of getting the nodes--regardless of whether or not we will use/display them.

So, we know that the data we are working with is a tree. Trees can be represented in many different ways in code. One such way is with a stack. If you remember, I used a stack to parse HTML as the pushing/popping function of a stack cleanly represents entering a new branch and then leaving one. Our approach is known: we will use a stack. But, how will a stack be applied to a result set of nodes?

First, the ordering of the nodes is crucial. Our end goal might be to make a threaded look for the tree (hence the usefulness of knowing the depth) and getting the nodes out in their threading order is simple: they need to be in ascending order of their left ID. If you do not understand why then consult this image on the MySQL website and look closely at the left/right ids. So, we can easily fulfill the first requirement: the nodes need to be in their threading order.

Second, it can be easily shown that for any node on the tree, the formula: (right_id - left_id - 1) / 2 will yield the total number of nodes below it. The total part is important! If the nodes were laid out in a flat array, then given our first assumption--that the nodes are in their threading order--then we can further assume that the result of this formula means that the following X number of nodes in the array after the current node (X being the total number of children of a given node) will be at least one level deeper than our current node.

Okay, so we have an inkling of how this will work. We get to a node, and we know that the next X nodes are children. That's simple enough; however, for nesting above one level, the thinking process gets more confusing: how do we keep track of the number of children there is and that we've already looped over if every parent of a given node includes the number of the children of a given node. Unfortunately, this last sentence might be poorly worded, so I will try to illustrate the problem differently. Consider the following tree:

               +----- child a (1) +----- child b (0)
root (3) ---|
               +----- child c (0)
array(
    'root',
    'child a',
    'child b',
    'child c',
)

Beside each node I have the total number of children that the node has in braces. After that I have a simplified version of the array of the nodes as they would be available from a result set. So, when we get to root we know that the next three nodes are children (child a, child b, child c). When we get to child a we know that the next one node is a child (child b). At child a, we know that we've consumed 1/3 of the children of root. When we're at child b, we know that we've consumed 1/1 children of child b, but how are we to say that we've now consumed 2/3 of the children of root? Finally, when we get to child c, we have consumed 3/3 of the children of root, and can now say that we are back to the same level as root is on.

So, we know that as we go we need to consume nodes. Given that we know how many children a node has, as we iterate over those children, we can reduce that counter by 1 for each node visited so that when the counter hits zero, we know that we're done looking at children and we're back to the parent nodes depth. The only problem is introduced with more than one levels of nesting. So we consume nodes and that affects the counter on the parent node, but it also needs to affect the counters on all parent nodes way up in the tree. Updating the counters of every parent node each time a node is consumed would be a waste of resources. Instead, it can easily be done when we're done looking at all the children of a given node. If we store the number of children twice--once acting as a counter that will be consumed, the other acting as a reference for how many total children were consumed--then we can accurately update parent counters as we go along.

The final solution (in pseudo code) is that at every node we encounter, we push pair($counter, $counter) onto the stack. As we encounter child nodes, we consume a node by reducing top($stack)[0] by one, and leave top($stack)[1] alone. By definition, count($stack) is the current depth that we are on. When top($stack)[0] < = 0, we do $temp = pop($stack) and then account for all the nodes we just consumed by using the second counter: top($stack)[0] -= $temp[1]. We clean up by constantly looking is top($stack)[0] <= 0 and popping those parts off.

That pseudo-code version is lacking but illustrates all of what actually needs to be done. For some, this might have originally seemed like a difficult problem (from the programming side, not from the SQL side). However, the solution is quite simple when we look at the problem for what it is: a tree, and not explicitly what we want: the depth of a node. Go check out the solution to how to find the depth of nodes for a nested set tree in my code folder (mostly comments, includes example at the bottom of the file).

<?php

!defined('IN_APP') && exit();

/**
 * Given a flat array of nodes in a mptt (nested set) tree, in ascending order
 * of their left id, figure out what level each node is on. The only things
 * required to figure out what level a node is on is its left and right ids.
 * This iterator uses a stack to keep track of levels.
 * 
 * @author Peter Goodman
 * @copyright Copyright (c) 2008, Peter Goodman. All rights reserved.
 */
class NestedSetIterator extends ProxyIterator {

    const LEFT_ID = 'left_index',
          RIGHT_ID = 'right_index',
          LEVEL = '_level';

    private $nodes,
            $level,
            $stack;

    /* Constructor, take in an array or iterator. If an array is passed, turn
       it into an iterator.
     */
    public function __construct(Iterator $nodes) {
        parent::__construct($nodes);
        $this->rewind();
    }

    /* Push an item onto the stack. The stack holds arrays of pairs that have
       the number of children of a given node. Pair[0] is reduced when an item
       is consumed and pair[1] is kept to reduce the parent stack elements
       pair[0] when the current top stack element is popped.
     */
    private function push(&$node) {

        if(!is_array($node) && !($node instanceof ArrayAccess))
            throw new Exception("Node is not an array nor has array access.");

        $num = ($node[self::RIGHT_ID] - $node[self::LEFT_ID] - 1) / 2;
        $this->stack[] = array($num, $num);
        $this->level++;
    }

    /* Pop a pair off the stack and update the number of consumed nodes of the
       new top node.
     */
    private function pop() {

        $level = array_pop($this->stack);
        $this->level--;

        /* We have just popped a level off of the stack. Consume works by
           decreasing the number of nodes on the top of the stack, but each
           stack elem knows the *total* number of nodes below it, and so 
           consuming nodes on the top of the stack doesn't account for lower
           elements (parent nodes). Therefor we decrease the current top of
           stack (after the pop) by how many total nodes there were originally
           in the node elem we just popped off.
         */
        $this->consume($level[1]);
    }

    /* Reduce the number of nodes consumed for the top pair of the stack. */
    private function consume($num = 1) {
        if(isset($this->stack[$this->level]))
            $this->stack[$this->level][0] -= $num;
    }

    /* Clean up after stack elements that have no more nodes to consume. */
    private function cleanup() {

        /* Move down the stack elems until we hit a level that still has nodes
           that haven't been consumed.
         */
        for($i = $this->level; isset($this->stack[$i]); $i--) {

            if($this->stack[$i][0] > 0)
                break;

            $this->pop();
        }
    }

    /*
     * Iterator methods
     */

    /* Get the current node */
    public function &current() {

        $node = parent::current();

        /* Use cached version of level so that it doesn't need to go through
           the normal consume/push/pop phase.
         */
        if(isset($node[self::LEVEL]))
            return $node;

        /* Cache doesn't exist, consume the node and push a new element onto
           the stack.
         */
        $this->consume();
        $this->push($node);

        /* Store the level in an unique name */
        $node[self::LEVEL] = $this->level;

        return $node;
    }

    /* Move to the next node. */
    public function next() {
        $this->cleanup();
        parent::next();
    }

    /* Reset everything. */
    public function rewind() {
        $this->stack = array();
        $this->level = -1;
        parent::rewind();
    }
}

Comments


Comment