Version: 0.4.24

4. AVL Trees#

Lecture:

Lecture 5.4 (slides)

Objectives:

Understand why BST must remain well-balanced. Understand how this is done in AVL trees.

Concepts:

balancing factor, tree rotations, insertion and deletion in AVL trees

Implementation:

Python avl.py

Binary search trees (BST) offer an overall efficient implement of the ordered set ADT (see Lecture 5.2). In the worst-case, yet that does not perform any better than a linked list. Let us see how self-balancing trees fix that.

There are several types of self-balancing trees: B-Trees, Red-black trees, scapegoat trees, etc. They differ by how they detect that the tree is getting out of balanced. We shall look at AVL trees, which control the difference between the heights of the left and right branches.

4.1. Why BST Can Be Inefficient?#

All BST operations are efficient in the average scenario (i.e., \(O(\log n)\)), whereas in the worst case, they run in linear time (see Lecture 5.2). What is this average case? It depends on the overall “shape” of the tree. Contrast the two trees shown below on Fig. 4.10 (opposite) and Fig. 4.11, respectively.

../../_images/well_balanced.svg

Fig. 4.11 A well balanced binary search tree#

Fig. 4.11 shows a “well-balanced” tree: All branches have the same length. The tree has a height of 2, and the lesser the height, the faster operations such as oset.contains() become. By contrast, the tree from Fig. 4.10 is just one long thin “zig-zag” branch, with a height of 6. This is a linked list in disguise.

Important

The order in which we add items to a BST decides the shape of the tree, and in turn how efficient will be the operations.

Both trees contains exactly the same set of items. What governs the shape is the order in which one adds these items. The well balanced tree from Fig. 4.11 result from a sequence of insertions such as \(s = (4, 2, 3, 6, 7, 1, 5)\) as shown on Fig. 4.12 below. If we insert items in a different order, say \(s'=(1, 7, 2, 6, 3, 4, 5)\), we would get the degenerated tree from Fig. 4.10.

../../_images/insertions.svg

Fig. 4.12 Growing a binary search tree one item at a time.#

Insertion sequences in ascending or descending order for instance, necessarily yield “degenerated” trees. Consider for example \(s=(1,2,3,4,5,6,7)\).

4.2. AVL Trees#

If we want BST to be always efficient, we need to find a way to maintain the “balance” of the tree. There are several strategies to build such self-balancing trees, such as:

  • AVL Trees: Maintain balance by correcting difference in the height of subtrees.

  • Scapegoat Trees: Maintain balance by correcting difference in the size of subtrees.

  • Red-black Trees: Maintain balance by ensuring that the longest path from the root to a leaf is no more than twice the length of the shortest path. This is done assigning colors (red or black) to nodes and enforcing specific constraints.

  • B-Trees: Generalize the idea of search trees to n-ary trees (see Lecture 5.5)

In the remainder, we look at AVL [#avl] trees and see how to implement the oset.add() and oset.remove() operations.

Important

The idea of AVL trees is to quantify how well a tree is balanced, and to adjust the tree whenever the balance falls outside the accepted interval. To adjust the tree, we use a special operation named a rotation.

4.2.1. Measuring Balance#

How can we detect when a tree is well balanced, or when it starts to degenerate into a linked list? The idea of AVL tree is to contrast the height of both sub trees (left and right). Recall that the height of a tree is the depth of its deepest descendant.

Let’s define a binary tree as a triplet \(T=(r, L, R)\) where \(r\) is the root, \(L\) the left sub tree, and \(R\) the right sub tree. We define the height of a tree as a function \(h(T)\) such that:

\[\begin{split}h(T) = \begin{cases} -1 & \textrm{if } T = \varnothing \\ 1 + \max(h(L), h(R)) & \textrm{if } T = (r, L, R) \end{cases}\end{split}\]

We now define the balance factor \(b\) as the difference in height between the left and right subtree, or more formally a function \(b(T)\) such that:

\[\begin{split}b(T) = \begin{cases} 0 & \textrm{if } T = \varnothing \\ h(L) - h(R) & \textrm{otherwise} \end{cases}\end{split}\]

With this definition, a well-balanced tree has a balance factor of zero, a left-heavy tree has a positive balance factor, and a right-heavy tree a negative balance factor. Consider Fig. 4.13 for example. At the root, the tree is well-balanced, although it is not complete. Both sub trees B and C have a height of 2, despite their different shape. B is well-balanced as well, but C is left-heavy.

../../_images/balance.svg

Fig. 4.13 A binary tree annotated with balance information for each node#

4.2.2. Rotation to Restore Balance#

Now we know when the tree starts to degenerate, we must reorganize its items, to improve the balance. In an AVL tree, each node must have a balance factor within the interval \([-1,1]\). Whenever the balance factor lands below or above, we must adjust the tree. AVL trees (and other types of search trees) rely on rotations: Re-organizations of nodes that preserves the BST invariant 2Any node is larger or equals to any of its left descendants, and smaller than any of its right descendants..

Important

An AVL tree maintains the balance factor of every node in the range \([-1, 1]\). Should the balance factor falls outside of this interval, we shall apply one or more rotation to correct it.

So we take action whenever the balance factor becomes either 2 or -2. The balance factor grows by one on every insertion. Let’s first consider the cases where the balance factor is -2. The case where the balance factor is 2 are symmetrical. There are two cases:

  • One child is outer heavy, shown below on Fig. 4.14: Either the right child is right-heavy (\(b=-2\) shown below) or the left child is left-heavy (\(b=2\), symmetric). Node A has a balance factor \(b = -2\). Its left child is a tree of height \(h\), but its right child is a tree of height \(h + 2\) because of its own right child that has a height of \(h+1\) itself.

    ../../_images/outer.svg

    Fig. 4.14 One child is “outer-heavy”#

  • One child is inner heavy, shown below on Fig. 4.15: Either the right child is left-heavy (\(b=-2\), shown below), or the left child is right-heavy (\(b=2\), symmetric). Node A still has a balance factor of \(b=2\), but here, it is the left child (inner) of Node B that has a height of \(h+1\)

    ../../_images/inner.svg

    Fig. 4.15 One child is “inner-heavy”#

4.2.2.1. Repairing the Outer Case#

We solve the outer case shown by Fig. 4.14 using a single rotation, as shown on Fig. 4.16. On the left hand side, the tree is outer-heavy, as we saw on Fig. 4.14. To correct this, we rotate A to the left. This rotation “lifts up Node B” as a the root of the tree, with Node A as a left child. It moves the subtree \(A < x \leq B\) to be a right child of Node A. The result, on the right hand side, is a tree whose balance factor is zero.

../../_images/rotation.svg

Fig. 4.16 Applying rotation to restore balance#

Note that the resulting tree (on the right hand side) preserves the BST invariant. Despite moving around subtrees, the ordering of nodes and subtrees is still correct. The inverse operation is to rotate Node B to the right (on the right hand side of Fig. 4.16).

Caution

The direction of these rotations is a matter of convention. I define a “left rotation” as the operation that lift up the right child, and a right rotation as the operation that lifts up the left child. Other may do the other way around.

4.2.2.2. Repairing the Inner Case#

Restoring the balance in the inner configuration (see Fig. 4.15 requires two rotations. Fig. 4.17 illustrates it. On the left hand, we expanded the inner case of Fig. 4.15, and revealed the inner subtree of Node B: Its root is Node C, which has two subtree of height \(h\). To return to a configuration with a balance factor of zero, we first rotate “B” to the right, which bring us back to the outer case, which we know how to solve. We then rotate A to the left.

../../_images/double_rotation.svg

Fig. 4.17 Restoring the inner case using two rotations in sequence#

Both rotation are the same operation we have used for the outer case. The first rotation target the child (Node B) whereas the second rotation targets the root of the tree (Node A).

4.2.2.3. The “rebalance” Procedure#

To re-balance a node, we must first identify whether we are in the inner or in outer case, and whether this happens on the left or on the right child.

One way to distinguish between these cases is to look at the balance factor: If it is positive, then the problem lays on the left child, if it is negative, on the right child.

There are all together four following:

  • Outer left: The node \(n\) is left heavy, and its left child is also left heavy. We can fix this by a right rotation

  • Inner left: The node \(n\) is left heavy, and its left child is right heavy We can fix this by a left rotation of the left child followed by a right rotation of Node \(n\)

  • Outer right: The node \(n\) is right heavy and its right child is also right heavy. We can fix that by a left rotation of Node \(n\)

  • Inner right: The node \(n\) is right heavy and its right child is left heavy. We can fix that by a right rotation of the right child followed by a left rotation of Node \(n\)

4.2.3. Insertion#

Now we are able to rebalance the tree should it starts to degenerate, we need to rebalance it after every insertion. Every insertion will deepen by one the height of some subtrees, and all these subtrees (along the path that leads to the new node) may need “re-balancing”.

4.2.4. Deletion#

Recall the deletion algorithm for BST: We first find the node delete. If it has two children, we replace it with its predecessor. If it has only one child, we replace it with that child, otherwise, we just delete the node.

Like the insertion, the deletion modifies the height of all the nodes along the “path” to the node that we want to delete, and all these node may need to be re-balanced. As we recursively descend along this path, we need to call our rebalance procedure.