Red-black tree
From Wikipedia, the free encyclopedia
|
A red-black tree is a type of self-balancing binary search tree, a data structure used in computer science, typically used to implement associative arrays. The original structure was invented in 1972 by Rudolf Bayer who called them "symmetric binary B-trees", but acquired its modern name in a paper in 1978 by Leo J. Guibas and Robert Sedgewick. It is complex, but has good worst-case running time for its operations and is efficient in practice: it can search, insert, and delete in O(log n) time, where n is the number of elements in the tree.
TerminologyA red-black tree is a special type of binary tree, used in computer science to organize pieces of comparable data, such as numbers. In red-black trees, the leaf nodes are not relevant and do not contain data. To save memory, sometimes a single sentinel node performs the role of all leaf nodes. All references from internal nodes to leaf nodes instead point to the sentinel node. Red-black trees, like all binary search trees, allow efficient in-order traversal of elements provided that there is a way to locate the parent of any node. The search-time results from the traversal from root to leaf, and therefore a balanced tree, having the least possible tree height, results in O(log n) search time. PropertiesA red-black tree is a binary search tree where each node has a color attribute, the value of which is either red or black. In addition to the ordinary requirements imposed on binary search trees, the following additional requirements of any valid red-black tree apply:
These constraints enforce a critical property of red-black trees: that the longest path from the root to a leaf is no more than twice as long as the shortest path from the root to a leaf in that tree. The result is that the tree is roughly balanced. Since operations such as inserting, deleting, and finding values requires worst-case time proportional to the height of the tree, this theoretical upper bound on the height allows red-black trees to be efficient in the worst-case, unlike ordinary binary search trees. To see why these properties guarantee this, it suffices to note that no path can have two red nodes in a row, due to property 4. The shortest possible path has all black nodes, and the longest possible path alternates between red and black nodes. Since all maximal paths have the same number of black nodes, by property 5, this shows that no path is more than twice as long as any other path. In many presentations of tree data structures, it is possible for a node to have only one child, and leaf nodes contain data. It is possible to present red-black trees in this paradigm, but it changes several of the properties and complicates the algorithms. For this reason, this article uses "null leaves", which contain no data and merely serve to indicate where the tree ends, as shown above. These nodes are often omitted in drawings, resulting in a tree which seems to contradict the above principles, but which in fact does not. A consequence of this is that all internal (non-leaf) nodes have two children, although one or more of those children may be a null leaf. Some explain a red-black tree as a binary search tree whose edges, instead of nodes, are colored in red or black, but this does not make any difference. The color of a node in this article's terminology corresponds to the color of the edge connecting the node to its parent, except that the root node is always black (property 2) whereas the corresponding edge does not exist. Applications and related data structuresRed-black trees offer worst-case guarantees for insertion time, deletion time, and search time. Not only does this make them valuable in time-sensitive applications such as real-time applications, but it makes them valuable building blocks in other data structures which provide worst-case guarantees; for example, many data structures used in computational geometry can be based on red-black trees. The AVL tree is another structure supporting O(log n) search, insertion, and removal. It is more rigidly balanced than Red-Black trees, leading to slower insertion and removal but faster retrieval. Red-black trees are also particularly valuable in functional programming, where they are one of the most common persistent data structures, used to construct associative arrays and sets which can retain previous versions after mutations. The persistent version of red-black trees requires O(log n) space for each insertion or deletion, in addition to time. Red-black trees are an isometry of 2-3-4 trees. In other words, for every 2-3-4 tree, there is a corresponding red-black tree with data elements in the same order. The insertion and deletion operations on 2-3-4 trees are also equivalent to color-flipping and rotations in red-black trees. This makes 2-3-4 trees an important tool for understanding the logic behind red-black trees, and this is why many introductory algorithm texts introduce 2-3-4 trees just before red-black trees, even though 2-3-4 trees are not often used in practice. In 2008, Sedgewick introduced a simpler version of red-black trees called Left-Leaning Red-Black Trees by eliminating a previously unspecified degree of freedom in the implementation. This page, however, describes the 1978 version. OperationsRead-only operations on a red-black tree require no modification from those used for binary search trees, because every red-black tree is a special case of a simple binary search tree. However, the immediate result of an insertion or removal may violate the properties of a red-black tree. Restoring the red-black properties requires a small number (O(log n) or amortized O(1)) of color changes (which are very quick in practice) and no more than three tree rotations (two for insertion). Although insert and delete operations are complicated, their times remain O(log n). InsertionInsertion begins by adding the node as we would in a simple binary search tree and coloring it red. What happens next depends on the color of other nearby nodes. The term uncle node will be used to refer to the sibling of a node's parent, as in human family trees. Note that:
Each case will be demonstrated with example C code. The uncle and grandparent nodes can be found by these functions: <source lang="c"> struct node * grandparent(struct node *n) { if ((n != NULL) && (n->parent != NULL)) return n->parent->parent; else return NULL; } struct node * uncle(struct node *n) { struct node *g = grandparent(n); /* * Here we blindly assume that g != NULL, i.e. we suppose a rootless tree. * However, a real tree always have an uppermost root node which we can't * go past, so an appropriate tree boundary check would be required here. */ if (n->parent == g->left) return g->right; else return g->left; } </source> Case 1: The new node N is at the root of the tree. In this case, it is repainted black to satisfy Property 2 (The root is black). Since this adds one black node to every path at once, Property 5 (All paths from any given node to its leaf nodes contain the same number of black nodes) is not violated. <source lang="c"> void insert_case1(struct node *n) { if (n->parent == NULL) n->color = BLACK; else insert_case2(n); } </source> Case 2: The new node's parent P is black, so Property 4 (Both children of every red node are black) is not invalidated. In this case, the tree is still valid. Property 5 (All paths from any given node to its leaf nodes contain the same number of black nodes) is not threatened, because the new node N has two black leaf children, but because N is red, the paths through each of its children have the same number of black nodes as the path through the leaf it replaced, which was black, and so this property remains satisfied. <source lang="c"> void insert_case2(struct node *n) { if (n->parent->color == BLACK) return; /* Tree is still valid */ else insert_case3(n); } </source>
<source lang="c"> void insert_case3(struct node *n) { struct node *u = uncle(n), *g; if ((u != NULL) && (u->color == RED)) { n->parent->color = BLACK; u->color = BLACK; g = grandparent(n); g->color = RED; insert_case1(g); } else { insert_case4(n);
}
} </source>
<source lang="c"> void insert_case4(struct node *n) { struct node *g = grandparent(n); if ((n == n->parent->right) && (n->parent == g->left)) { rotate_left(n->parent); n = n->left; } else if ((n == n->parent->left) && (n->parent == g->right)) { rotate_right(n->parent); n = n->right; } insert_case5(n); } </source>
<source lang="c"> void insert_case5(struct node *n) { struct node *g = grandparent(n); n->parent->color = BLACK; g->color = RED; if ((n == n->parent->left) && (n->parent == g->left)) { rotate_right(g); } else { /* * Here, (n == n->parent->right) && (n->parent == g->right). */ rotate_left(g); } } </source> Note that inserting is actually in-place, since all the calls above use tail recursion. RemovalIn a normal binary search tree, when deleting a node with two non-leaf children, we find either the maximum element in its left subtree or the minimum element in its right subtree, and move its value into the node being deleted (as shown here). We then delete the node we copied the value from, which must have less than two non-leaf children. Because merely copying a value does not violate any red-black properties, this reduces the problem of deleting to the problem of deleting a node with at most one non-leaf child. It does not matter whether this node is the node we originally wanted to delete or the node we copied the value from. For the remainder of this discussion we can assume we are deleting a node with at most one non-leaf child, which we will call its child (if it has only leaf children, let one of them be its child). If we are deleting a red node, we can simply replace it with its child, which must be black. All paths through the deleted node will simply pass through one less red node, and both the deleted node's parent and child must be black, so properties 3 (All leaves, including nulls, are black) and 4 (Both children of every red node are black) still hold. Another simple case is when the deleted node is black and its child is red. Simply removing a black node could break Properties 4 (Both children of every red node are black) and 5 (All paths from any given node to its leaf nodes contain the same number of black nodes), but if we repaint its child black, both of these properties are preserved. The complex case is when both the node to be deleted and its child is black. We begin by replacing the node to be deleted with its child. We will call (or label) this child (in its new position) N, and its sibling (its new parent's other child) S. In the diagrams below, we will also use P for N's new parent, SL for S's left child, and SR for S's right child (it can be shown that S cannot be a leaf).
We will find the sibling using this function: <source lang="c"> struct node * sibling(struct node *n) { if (n == n->parent->left) return n->parent->right; else return n->parent->left; } </source>
We can perform the steps outlined above with the following code, where the function <source lang="c"> void delete_one_child(struct node *n) { /* * Precondition: n has at most one non-null child. */ struct node *child = is_leaf(n->right) ? n->left : n->right; replace_node(n, child); if (n->color == BLACK) { if (child->color == RED) child->color = BLACK; else delete_case1(child); } free(n); } </source>
If both N and its original parent are black, then deleting this original parent causes paths which proceed through N to have one fewer black node than paths that do not. As this violates Property 5 (All paths from any given node to its leaf nodes contain the same number of black nodes), the tree must be rebalanced. There are several cases to consider: Case 1: N is the new root. In this case, we are done. We removed one black node from every path, and the new root is black, so the properties are preserved. <source lang="c"> void delete_case1(struct node *n) { if (n->parent != NULL) delete_case2(n); } </source>
<source lang="c"> void delete_case2(struct node *n) { struct node *s = sibling(n); if (s->color == RED) { n->parent->color = RED; s->color = BLACK; if (n == n->parent->left) rotate_left(n->parent); else rotate_right(n->parent); } delete_case3(n); } </source>
<source lang="c"> void delete_case3(struct node *n) { struct node *s = sibling(n); if ((n->parent->color == BLACK) && (s->color == BLACK) && (s->left->color == BLACK) && (s->right->color == BLACK)) { s->color = RED; delete_case1(n->parent); } else delete_case4(n); } </source>
<source lang="c"> void delete_case4(struct node *n) { struct node *s = sibling(n); if ((n->parent->color == RED) && (s->color == BLACK) && (s->left->color == BLACK) && (s->right->color == BLACK)) { s->color = RED; n->parent->color = BLACK; } else delete_case5(n); } </source>
<source lang="c"> void delete_case5(struct node *n) { struct node *s = sibling(n); if ((n == n->parent->left) && (s->color == BLACK) && (s->left->color == RED) && (s->right->color == BLACK)) { s->color = RED; s->left->color = BLACK; rotate_right(s); } else if ((n == n->parent->right) && (s->color == BLACK) && (s->right->color == RED) && (s->left->color == BLACK)) { s->color = RED; s->right->color = BLACK; rotate_left(s); } delete_case6(n); } </source>
<source lang="c"> void delete_case6(struct node *n) { struct node *s = sibling(n); s->color = n->parent->color; n->parent->color = BLACK; if (n == n->parent->left) { /* * Here, s->right->color == RED. */ s->right->color = BLACK; rotate_left(n->parent); } else { /* * Here, s->left->color == RED. */ s->left->color = BLACK; rotate_right(n->parent); } } </source> Again, the function calls all use tail recursion, so the algorithm is in-place. Additionally, no recursive calls will be made after a rotation, so a constant number of rotations (up to 3) are made. Proof of asymptotic boundsA red black tree which contains n internal nodes has a height of O(log(n)). Definitions:
Lemma: A subtree rooted at node v has at least Failed to parse (Missing texvc executable; please see math/README to configure.): 2^{bh(v)}-1 internal nodes. Proof of Lemma (by induction height): Basis: h(v) = 0 If v has a height of zero then it must be null, therefore bh(v) = 0. So:
internal nodes implies that Failed to parse (Missing texvc executable; please see math/README to configure.): v' such that h(Failed to parse (Missing texvc executable; please see math/README to configure.): v' ) = k+1 has Failed to parse (Missing texvc executable; please see math/README to configure.): 2^{bh(v')}-1 internal nodes. Since Failed to parse (Missing texvc executable; please see math/README to configure.): v' has h(Failed to parse (Missing texvc executable; please see math/README to configure.): v' ) > 0 it is an internal node. As such it has two children which have a black-height of either bh(Failed to parse (Missing texvc executable; please see math/README to configure.): v' ) or bh(Failed to parse (Missing texvc executable; please see math/README to configure.): v' )-1 (depending on whether Failed to parse (Missing texvc executable; please see math/README to configure.): v'
is red or black). By the inductive hypothesis each child has at least Failed to parse (Missing texvc executable; please see math/README to configure.): 2^{bh(v')-1}-1
internal nodes, so Failed to parse (Missing texvc executable; please see math/README to configure.): v'
has:
Using this lemma we can now show that the height of the tree is logarithmic. Since at least half of the nodes on any path from the root to a leaf are black (property 4 of a red black tree), the black-height of the root is at least h(root)/2. By the lemma we get:
ComplexityIn the tree code there is only one loop where the node of the root of the red-black property that we wish to restore, x, can be moved up the tree by one level at each iteration. Since the original height of the tree is O(log n), there are O(log n) iterations. So overall the insert routine has O(log n) complexity. References
External linksDemos
Implementations
de:Rot-Schwarz-Baum es:Árbol rojo-negro fa:درخت قرمز- سیاه fr:Arbre bicolore ko:레드-블랙 트리 hr:Crveno-crno stablo id:Pohon merah-hitam it:Albero rosso-nero he:עץ אדום שחור lt:Raudonai-Juodas medis ja:赤黒木 pl:Drzewo czerwono-czarne pt:Árvore rubro-negra ru:Красно-чёрное дерево sr:Црвено-црно стабло fi:Punamusta puu sv:Röd-svart träd vi:Cây đỏ đen uk:Червоно-чорне дерево |










