So, we've built up some pretty nifty binary trees - we can use the binary tree both as the
basis of an implementation of a set, or as an implementation of a dictionary. But our
implementation has had one major problem: it's got absolutely no way to maintain balance. What
that means is that depending on the order in which things are inserted to the tree, we might
have excellent performance, or we might be no better than a linear list. For example, look at
these trees. As you can see, a tree with the same values can wind up quite different. In a good insert order, you can wind up with a nicely balanced tree: the minimum distance from root to leaf is 3; the maximum is 4. On the other hand, take the same values, and insert them in a different order and you
get a rotten tree; the minimum distance from root to leaf is 1, and the maximum is 7. So depending on
luck, you can get a tree that gives you good performance, or one that ends up giving you no better than a plain old list.

Today we're going to look at fixing that problem. That's really more of a lesson in data
structures than it is in Haskell, but we'll need to write more complicated and interesting
data structure manipulation code than we have so far, and it'll be a lollapalooza of pattern
matching. What we're going to do is turn our implementation into a red-black tree.

A
red-black tree is a normal binary search tree, except that each node is assigned a
color, which is either red or black. The colored tree is maintained so that the
following properties always hold:

1. The root of the tree is always black.
2. All branches of a tree end in a null which is black.
3. All children or red nodes are black.
4. For all nodes in the tree, all downward paths from the node to a leaf contain the
same number of *black* nodes.

If these invariants are maintained, they guarantee that tree is almost balanced:
from any node in the tree, the longest path from that node to a leaf is no more than twice the
shortest path from that node to a root.

We'll start by writing the Haskell type declaration for a red/black tree. We'll just do it
as a tree of ordered values to keep things simple.

```>data Color = Red | Black deriving (Eq, Show)
>
>data (Ord a) => RedBlackTree a = RBTNode a Color (RedBlackTree a) (RedBlackTree a)
>                            | RBTEmpty deriving (Eq,Show)
```

Also, for convenience, we'll write a couple of accessor functions that we'll
use later on. Something interesting to note about these accessors is that they use
non-exhaustive patterns: there are values of type `RedBlackTree a`
for which these functions are undefined. If you call the function on one of those
values, you'll get a runtime error. It's generally considered bad style to write
non-exhaustive functions like this, but the best solution to that would
be to use `Maybe` types, and they'll have a nasty impact on the
complexity of some of the code later, we won't worry about that for now; I'll explain
how to fix that up in a future post, when we get to monads.

```>
>rbtLeftChild :: (Ord a) => RedBlackTree a -> RedBlackTree a
>rbtLeftChild (RBTNode _ _ l _) = l
>
>rbtRightChild :: (Ord a) => RedBlackTree a -> RedBlackTree a
>rbtRightChild (RBTNode _ _ _ r) = r
>
>rbtValue :: (Ord a) => RedBlackTree a -> a
>rbtValue (RBTNode v _ _ _) =  v
>
>rbtColor :: (Ord a) => RedBlackTree a -> Color
>rbtColor (RBTNode _ c _ _) = c
>rbtColor RBTEmpty = Black
```

Inserting data into the tree is where things get interesting. You start by doing
basically the same thing you did in the simple binary search tree; the only
difference is that the node needs a color. New nodes are always red. Once we've done
that initial part of the insert, we need to verify that all of the invariants hold. If not,
we'll need to fix the tree. To keep things reasonably clean and separate, we'll use the
tail-calling version of tree insert that I showed in this post, and
then tail-call a rebalance function when the basic insert is complete. Rebalance will
fix the balance of the tree, and do the tree re-assembly as it climbs up the tree.

```>rbtInsert :: (Ord a) => RedBlackTree a -> a -> RedBlackTree a
>rbtRebalance :: (Ord a) => RedBlackTree a -> [RedBlackTree a] -> RedBlackTree a
>--rbtRebalance               focus              ancestors
>
>rbtInsert node v =
>    rbtInsertTailCall node v []
>
>rbtInsertTailCall node@(RBTNode v color left right) newval path
>             | v > newval = rbtInsertTailCall left newval (node:path)
>             | otherwise = rbtInsertTailCall right newval (node:path)
>rbtInsertTailCall RBTEmpty v path =
>    	rbtRebalance (RBTNode v Red RBTEmpty RBTEmpty) path
```

All over the place as we rebalance the tree, we'll have places where we want to
"rebuild" nodes to patch in the insertion change; as usual, we separate that into
its own function.

```>-- Reconstruct takes a child node and a parent node, and creates a replacement
>-- for the parent node with the child in the appropriate position. It allows
>-- the color of the new node to be specified.
>reconstructNode node@(RBTNode v c l r) parent@(RBTNode pv pc pl pr) color =
>   if (pv > v)
>      then (RBTNode pv color node pr)
>      else (RBTNode pv color pl node)
```

Now, we need to think about what we're going to do to keep the tree balanced as we walk
back up the insertion path fixing the tree. There are two things we can do to make the
tree respect the invariants: we can re-color nodes, or we can pivot subtrees.

Pivoting a tree is an interesting operation - it's a process of swapping a node and one of
its children to rotate a section of the tree. Suppose we have a binary search tree like the one
in the diagram to the side. It's poorly balanced; it's got only one node to its left, but 7
nodes to its right. To correct this by pivoting, what we'll do is take node 6 - currently a
child of the root, and rotate the tree counterclockwise around it, so that 6 becomes the root,
the old root (2) becomes the left child of 6, and the old left child of 6 (node 4) becomes the
right child of the old root.

So after the pivot, our tree looks like this. This
operation was a left pivot; a right pivot does the same kind of thing, but rotating
the tree clockwise instead of counterclockwise.

So let's go ahead and write the pivot operations. We'll write two pivot
functions: one for each direction. We'll pass the pivot operation
a subtree whose root and child in the appropriate direction are to be rotated. In addition,
we'll also add a parameter for managing the color of the new root node. In some cases,
we'll want to swap the colors of the nodes being moved; in other cases, we won't. So we'll
put a boolean parameter in to specify whether or not to swap the colors.

```> -- pivot left tree at root; second parent indicates whether or not to swap
> -- colors of the nodes that are being moved.
>rbtPivotLeft :: (Ord a) => RedBlackTree a -> Bool -> RedBlackTree a
>rbtPivotLeft (RBTNode rootval rootcolor sib (RBTNode focval foccolor focleft focright)) swap =
>       (RBTNode focval newrootcolor oldroot focright) where
>             newrootcolor = if swap then rootcolor else foccolor
>             oldrootcolor = if swap then foccolor else rootcolor
>             oldroot = RBTNode rootval oldrootcolor sib focleft
>
>
>rbtPivotRight (RBTNode rootval rootcolor (RBTNode focval foccolor focleft focright) sib) swap =
>       (RBTNode focval newrootcolor focleft oldroot) where
>             newrootcolor = if swap then rootcolor else foccolor
>             oldrootcolor = if swap then foccolor else rootcolor
>             oldroot = RBTNode rootval oldrootcolor focright sib
>
```

So, let's try taking a look at how the pivots work. First, we need to construct
some trees to rebalance. We'll just do it manually, since the insert code isn't properly
finished yet.

```>twentyseven = RBTNode 27 Black RBTEmpty RBTEmpty
>twentytwo = RBTNode 22 Black RBTEmpty RBTEmpty
>twentyfive = RBTNode 25 Black twentytwo twentyseven
>sixteen = RBTNode 16 Black RBTEmpty RBTEmpty
>twenty = RBTNode 20 Black sixteen twentyfive
>twelve = RBTNode 12 Black RBTEmpty RBTEmpty
>fifteen = RBTNode 15 Black twelve twenty
>two = RBTNode 2 Black RBTEmpty RBTEmpty
>seven = RBTNode 7 Black RBTEmpty RBTEmpty
>five = RBTNode 5 Black two seven
>ten = RBTNode 10 Black five fifteen
```

That produces a unbalanced binary tree that looks like this:

```RBTNode 10 Black
(RBTNode 5 Black -- 10left
(RBTNode 2 Black RBTEmpty RBTEmpty)  -- 5 left
(RBTNode 7 Black RBTEmpty RBTEmpty)) -- 5 right
(RBTNode 15 Black -- 10 right
(RBTNode 12 Black RBTEmpty RBTEmpty) -- 15 left
(RBTNode 20 Black  -- 15 right
(RBTNode 16 Black RBTEmpty RBTEmpty) -- 20 left
(RBTNode 25 Black -- 20 right
(RBTNode 22 Black RBTEmpty RBTEmpty) -- 25 left
(RBTNode 27 Black RBTEmpty RBTEmpty)))) -- 25 right
```

Let's do a quick test, and try doing a left pivot on the root.

```*Main> rbtPivotLeft ten False
RBTNode 15 Black (RBTNode 10 Black (RBTNode 5 Black (RBTNode 2 Black RBTEmpty RBTEmpty) (RBTNode 7 Black RBTEmpty RBTEmpty)) (RBTNode 12 Black RBTEmpty RBTEmpty)) (RBTNode 20 Black (RBTNode 16 Black RBTEmpty RBTEmpty) (RBTNode 25 Black (RBTNode 22 Black RBTEmpty RBTEmpty) (RBTNode 27 Black RBTEmpty RBTEmpty)))
*Main>
```

Cleaned up, that looks like this:

```RBTNode 15 Black
(RBTNode 10 Black
(RBTNode 5 Black
(RBTNode 2 Black RBTEmpty RBTEmpty)
(RBTNode 7 Black RBTEmpty RBTEmpty))
(RBTNode 12 Black RBTEmpty RBTEmpty))
(RBTNode 20 Black
(RBTNode 16 Black RBTEmpty RBTEmpty)
(RBTNode 25 Black
(RBTNode 22 Black RBTEmpty RBTEmpty)
(RBTNode 27 Black RBTEmpty RBTEmpty)))
```

Much better - that's much closer to a balanced tree! So now that we know how to do the
pivot, and we've seen that it works correctly, we can look at building the rebalance code.

With pivots out of the way, we can start looking at how to decide what
operations to do to rebalance the tree. When we're doing an insert, we end up inserting a red node on the bottom of the tree.
It's got two children, both null, which are considered black. If the parent of our new node is
black, then everything is fine; we haven't altered the number of black nodes on any path from
a node to a leaf. So we're done. But if the parent is red, then we've got a red child of a red
node, so we need to do some fixing.

Fixing an imbalance in a red-black tree can (and in fact often will) trigger a cascade of
changes. But in each case, we can look at the specific problem, and fix it, and we'll always
know exactly where the next potential problem is. So we'll look at it in terms of a focal
node
, which is the node causing the immediate problem we're fixing.

The potential cases we can encounter are:

1. The focal node is the root of the tree. In that case, we make it black. That adds
on black node to every path in the tree, which leaves us with a valid tree, so we're
done.
2. The focal node is red, but has a black parent. Again, that's fine. No problem.
3. The focal node is red; it's parent is also red. Then we need to look at its uncle; that is, the node that is the sibling of its parent. If both the new node, the parent and the uncle are all red, then we change the color of the parent and uncle to black,
and the grandparent to red. After this, the grandparent becomes the focal node, and we continue to do our tree-fixing with the new focus.
4. Here's where it gets a bit messy - figuring out the correct pivots. If the focal node and its parent are both red, and the uncle is black, then we need to do a pivot.
1. The focal node is the right child of its parent, and the parent is the left node of the grandparent, then we do a left pivot of the focal node and its parent, and the former parent becomes the new focal node.
2. The focal node is the left child of its parent, and the parent is the right child of the grandparent, then we do a right pivot of the focal node and its parent, and the former parent becomes the new focus.
3. The focal node is the left child of its parent, and the parent is the left child of the grandparent. Then we do a right pivot of the parent and the grandparent and swap the colors of the parent and grandparent. The parent becomes the focus.
4. The focal node is the right child of its parent, and the parent is the right child of the grandparent. Then we do a left pivot of the parent and the grandparent and swap the colors of the parent and grandparent. The parent becomes the focus.

Ok, there's the algorithm for rebalancing. How can we code it in Haskell? We've got a list
of the nodes from the insertion path, in leaf to root order. When we look at the rebalance, we can see that there are a bunch of different cases
which we can separate via pattern matching:

1. The focus is the root of the tree. We can select this case by using
an empty list for the pattern for the ancestors parameter. Once we've gotten to
the root, the tree is balanced, and the only corrective thing we may need to
do is make the root black. So:

```>-- Root is focus; no matter what color it is, just make it black
>rbtRebalance (RBTNode v _ left right) [] = RBTNode v Black left right
>rbtRebalance node@(RBTNode v _ left right) (parent@(RBTNode pv pc pl pr):[])
>                        | pv > v = RBTNode pv pc node pr
>      	                 | otherwise = RBTNode pv pc pl node
```
2. Also very simple is the case where the focus is black. In that case, we don't
need to do anything except patch in the insert, and continue up the tree.

```>-- black node - just patch in the change, and climb.
>
>rbtRebalance focus@(RBTNode fv Black left right) (parent@(RBTNode pv pc pl pr):ancestors)
>            | pv > fv = rbtRebalance (RBTNode pv pc focus pr) ancestors
>      	     | otherwise = rbtRebalance (RBTNode pv pc pl focus) ancestors
>
```
3. Next, we've got the case of a red node with a black parent. We can identify it by using
"```RBTNode v Red left right" as a pattern for the focus, and "RBTNode _ Black _ _" as a pattern for the parent. A red node with a black parent is OK, as long as the subtree under the red is balanced; and since we're balancing from the bottom up, we know that everything beneath this node is balanced. So: >rbtRebalance focus@(RBTNode fv Red left right) (parent@(RBTNode pv Black pl pr):ancestors) = > rbtRebalance (reconstructNode focus parent Black) ancestors ```
4. ``` Now we're getting to the interesting cases, which are the cases where both the node and its parent are red. We can separate two cases here: cases where we'll fix using a pivot, and cases where we'll fix using a recoloring. The way to distinguish them is by looking at the uncle of the focus node; that is, the sibling of the nodes parent. The red-red case is complicated enough that instead of writing out huge pattern expressions, we'll simplify it by separating the function into several layers of calls, each of which does a phase of the pattern match. We want to separate out the cases where we've got a red node with a red parent and a red uncle, and the cases where we've got a red node with a red parent and a black uncle. If the focus, its parent, and its uncle are all red, then we're in a recoloring case; if the focus and its parent are red, and the uncle is black, then we're in a pivot case. >rbtRebalance focus@(RBTNode v Red left right) (parent@(RBTNode _ Red _ _):ancestors) = > rebalanceRedRedNode focus parent ancestors To be able to recognize sub-cases when we have a red node/red parent, we need to be able to look at the path from the grandparent to the focus, and the color of the uncle. So we'll write some helper functions to get those. >uncleColor node parent grandparent = > if (parent == rbtLeftChild grandparent) > then rbtColor (rbtRightChild grandparent) > else rbtColor (rbtLeftChild grandparent) > >data TwoStepPath = LeftLeft | LeftRight | RightLeft | RightRight > >pathFromGrandparent :: (Ord a) => RedBlackTree a -> RedBlackTree a -> RedBlackTree a -> TwoStepPath >pathFromGrandparent node@(RBTNode v _ l r) parent@(RBTNode pv _ pl pr) grand@(RBTNode gv _ gl gr) > | pv < gv && v | pv >= gv && v | pv = pv = LeftRight > | pv >= gv && v >= pv = RightRight To actually handle the red node/red parent, first we separate out the case where the red parent is the root of the tree - there are no more ancestors on the insertion path. In that case, we can just climb to root, and do the correction from there.</p > >-- node is red, parent is red, but parent is root: just go to parent(root), and fix >-- from there. >rebalanceRedRedNode focus@(RBTNode fv fc fl fr) parent@(RBTNode pv pc pl pr) [] = > rbtRebalance (reconstructNode focus parent Red) [] Otherwise, we need to check whether the uncle was red or black. If it was black, we do a recolor correction; if it was red, we figure out what kind of pivot to do. We'll use a bunch of helper function to make it easy. >rebalanceRedRedNode focus parent (grand@(RBTNode gv gc gl gr):ancestors) = > if (uncleColor focus parent grand) == Red > then recolorAndContinue focus parent grand ancestors > else case (pathFromGrandparent focus parent grand) of > LeftLeft -> rbtRebalance (pivotGrandparentRight focus parent grand) ancestors > LeftRight -> rbtRebalance (pivotParentLeft focus parent) (grand:ancestors) > RightLeft -> rbtRebalance (pivotParentRight focus parent) (grand:ancestors) > RightRight -> rbtRebalance (pivotGrandparentLeft focus parent grand) ancestors The code above is really just using patterns for case selection. The actual work is in the helper functions that get called. They're all simple functions. First, we have some custom pivot functions - one for each direction for pivoting around a parent (the cases where the node is left of the parent, and the parent is right of the grandparent, or vise versa), and one for each direction pivoting around a grandparent (both node and parent are left children, or both are right children). >pivotGrandparentLeft node parent@(RBTNode pv pc pl pr) grand@(RBTNode gv gc gl gr) = > rbtPivotLeft (RBTNode gv gc gl (RBTNode pv pc pl node)) True > >pivotGrandparentRight node parent@(RBTNode pv pc pl pr) grand@(RBTNode gv gc gl gr) = > rbtPivotRight (RBTNode gv gc (RBTNode pv pc node pr) gr) True > >pivotParentLeft node parent@(RBTNode pv pc pl pr) = > rbtPivotLeft (RBTNode pv pc pl node) False > >pivotParentRight node parent@(RBTNode pv pc pl pr) = > rbtPivotRight (RBTNode pv pc node pr) False And a function to do the recoloring for when the uncle was red: >recolorAndContinue focus@(RBTNode v c l r) parent@(RBTNode pv pc pl pr) grand@(RBTNode gv gc gl gr) ancestors = > let path = pathFromGrandparent focus parent grand > uncle = (case path of > LeftLeft -> gr > LeftRight -> gr > RightLeft -> gl > RightRight -> gl) > newUncle = if (uncle == RBTEmpty) > then RBTEmpty > else (RBTNode (rbtValue uncle) Black (rbtLeftChild uncle) (rbtRightChild uncle)) > newparent = reconstructNode focus parent Black > newGrandParent = (case path of > LeftLeft -> (RBTNode gv Red newparent newUncle) > LeftRight -> (RBTNode gv Red newparent newUncle) > RightLeft -> (RBTNode gv Red newUncle newparent) > RightRight -> (RBTNode gv Red newUncle newparent)) > in rbtRebalance newGrandParent ancestors ```
``` And that, finally, is it. For the binary search tree without balancing code, the worst case is inserting a list of values in order. So let's try that, to see how well this works. *Main> foldl ( x y -> rbtInsert x y) RBTEmpty [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16] RBTNode 4 Black (RBTNode 2 Black (RBTNode 1 Black RBTEmpty RBTEmpty) (RBTNode 3 Black RBTEmpty RBTEmpty)) (RBTNode 8 Red (RBTNode 6 Black (RBTNode 5 Black RBTEmpty RBTEmpty) (RBTNode 7 Black RBTEmpty RBTEmpty)) (RBTNode 12 Black (RBTNode 10 Red (RBTNode 9 Black RBTEmpty RBTEmpty) (RBTNode 11 Black RBTEmpty RBTEmpty)) (RBTNode 14 Red (RBTNode 13 Black RBTEmpty RBTEmpty) (RBTNode 15 Black RBTEmpty (RBTNode 16 Red RBTEmpty RBTEmpty))))) Since that's completely illegible, let's clean it up, and look at it in picture form: The shortest path from root to leaf is [4,2,1]; the longest is [4,8,12,14,15,16]. Just like we promised: the longest is no more than twice the shortest. It's a pretty good search tree, and the rebalancing work isn't terribly expensive, and amortizes nicely over a long run of inserts. The insert time ends up amortizing to **O**(lg n), just like the simple binary search tree insert. ```
``` 12 responses so far mattiast says: January 1, 2007 at 3:33 pm In recolorAndContinue there is conditional: "newUncle = if (uncle == RBTEmpty)". Isn't recolorAndContinue called only when uncle is red? In that case it cannot be null, because null is black, so we don't actually have to check that. Reply mws says: January 1, 2007 at 4:54 pm This is one of the best and most flexible data structures around. It is so useful that I implemented this in Java at my last job; we used the red-black trees to implement 'copy-on-write' sets and maps. When you insert or remove an element, you get a new tree while the old one remains perfectly valid (if you don't want the old tree, discard the reference and it gets garbage collected). If anyone wants to know more about functional red-black trees and other functional data structures, then you must get the book "Purely Functional Data Structures" by Chris Okasaki. Reply Pseudonym says: January 1, 2007 at 7:54 pm This is an excellent time to introduce QuickCheck. WARNING: Untested code follows. The conditions that make a tree a red/black tree are easily written in code: -- The root of the tree is Black. prop_Rb1 :: (Ord a) => RedBlackTree a -> Bool prop_Rb1 t = rbtColor t == Black -- All brances of a tree end in a null which is black. prop_Rb2 :: (Ord a) => RedBlackTree a -> Bool prop_Rb2 (RBTNode _ _ l r) = prop_Rb2 l && prop_Rb2 r prop_Rb2 t = rbtColor t == Black -- All children of red nodes are black. prop_Rb3 :: (Ord a) => RedBlackTree a -> Bool prop_Rb3 (RBTNode _ col l r)   = prop_Rb3 l && prop_Rb3 r   && if col == Red  &nbsp  ;then (rbtColor l == Black && rbtColor r == Black)  &nbsp  ;else True prop_Rb3 t = True -- For all nodes in the tree, all downward paths from the -- node to a leaf contain the same number of Black nodes. prop_Rb4 :: (Ord a) => RedBlackTree a -> Bool prop_Rb4 = {- detail omitted -} -- Don't forget to also add a property checking that the -- tree satisfies the binary search tree condition. prop_RbBST :: (Ord a) => RedBlackTree a -> Bool prop_RbBST = {- detail omitted -} We'll help things out a bit by making a small module to aid the tests: import Test.QuickCheck import RedBlackTree buildtree :: [Int] -> RedBlackTree Int buildTree = foldl rbtInsert RBTEmpty testProp1 = prop_Rb1 . buildtree testProp2 = prop_Rb2 . buildtree testProp3 = prop_Rb3 . buildtree testProp4 = prop_Rb4 . buildtree testPropBST = prop_RbBST . buildtree Now we can test our code: Main> quickCheck testProp1 Reply Pseudonym says: January 1, 2007 at 7:56 pm Yikes! Here's the property code again: -- The root of the tree is Black. prop_Rb1 :: (Ord a) => RedBlackTree a -> Bool prop_Rb1 t = rbtColor t == Black -- All brances of a tree end in a null which is black. prop_Rb2 :: (Ord a) => RedBlackTree a -> Bool prop_Rb2 (RBTNode _ _ l r) = prop_Rb2 l && prop_Rb2 r prop_Rb2 t = rbtColor t == Black -- All children of red nodes are black. prop_Rb3 :: (Ord a) => RedBlackTree a -> Bool prop_Rb3 (RBTNode _ col l r)   = prop_Rb3 l && prop_Rb3 r   && if col == Red  &nbsp  ;then (rbtColor l == Black && rbtColor r == Black)  &nbsp  ;else True prop_Rb3 t = True -- For all nodes in the tree, all downward paths from the -- node to a leaf contain the same number of Black nodes. prop_Rb4 :: (Ord a) => RedBlackTree a -> Bool prop_Rb4 = {- detail omitted -} -- Don't forget to also add a property checking that the -- tree satisfies the binary search tree condition. prop_RbBST :: (Ord a) => RedBlackTree a -> Bool prop_RbBST = {- detail omitted -} And here's the test module: import Test.QuickCheck import RedBlackTree buildtree :: [Int] -> RedBlackTree Int buildTree = foldl rbtInsert RBTEmpty testProp1 = prop_Rb1 . buildtree testProp2 = prop_Rb2 . buildtree testProp3 = prop_Rb3 . buildtree testProp4 = prop_Rb4 . buildtree testPropBST = prop_RbBST . buildtree Reply Ørjan Johansen says: January 1, 2007 at 8:12 pm I think you can avoid some of the non-exhaustive search by including the color in the static type, something like: data RBTree a = BTree (BTree a) | RNode a (BTree a) (BTree a) data BTree a = BTEmpty | BNode a (RBTree a) (RBTree a) Reply Mark C. Chu-Carroll says: January 1, 2007 at 8:23 pm Ørjan: That's an *excellent* suggestion. It's actually rather stupid of me to *not* have implemented it that way. Reply Craig Stuntz says: January 2, 2007 at 12:17 pm we'll use the tail-calling version of tree insert that I showed in this post The "this post" link goes back to this post instead of the one you intended, I think. Reply Mark C. Chu-Carroll says: January 2, 2007 at 12:26 pm Craig: Thanks for catching that! I write my posts on my macbook using TextMate, and then upload them to the MovableType system we use at SB. Somehow, in the course of uploading the post to MT, I deleted the URL in that link. I'd love to know how I managed it, but in any case, it's fixed now. Reply Stephen says: January 3, 2007 at 2:08 pm It was the HFS file system on the Mac that turned me off to balanced b-trees. Even for large directories, file lookups in Unix are faster (on similar hardware). That's where the balanced trees should shine. Unbalanced binary trees can be really fast. They can be more or less balanced if you can randomize the order of your input. Typically, this can be accomplished in constant time. It's even easy if you have plenty of RAM. Plenty of RAM is a relative to the size of the problem. Now, at home, i have half a GB of RAM. So problems up to nearly that size are "small". Since RAM can be 10,000 times faster than disk, and reading large chunks of disk can be 50 times faster than random access, there can be great incentive to load the entire data set and use stupid sequential searches to achieve your goals. It's one thing to use good algorithms to achieve speed. It's another thing altogether to measure the speed you get, and try it more than one way. Reply Mark C. Chu-Carroll says: January 3, 2007 at 2:23 pm Stephen: This is about balanced *binary trees*, not *b-trees*. Red-black trees are used mainly for in-memory storage, in order to keep a binary tree balanced. The kinds of randomization you mention will typically end up being more expensive that the red-black tree rebalancing. B-trees *do* work astonishingly well when they're applied correctly. But that "applied correctly" is the big trick. If you're not careful about how/where you use them, you can create a total performance disaster. But plenty of people use them to make blazingly fast systems. For a few examples: * ReiserFS on linux is based on b-trees, and for many tasks, particularly tasks involving large numbers of small files, Reiser kills other filesystems in performance. * Pretty much every major high performance database system whether open-source (MySQL, PostgreSQL, BerkeleyDB), or proprietory (Oracle,DB2) use b-trees for storage in at least some tables. Reply Nikolas Coukouma says: January 6, 2007 at 8:23 am I recommend checking out Stefan Kahrs' implementations, which improve upon the work done by Chris Okasaki. Reply Nikolas Coukouma says: January 7, 2007 at 1:19 pm Hrm, it's worth noting that red-black trees are based on the logic 2-3-4 trees, which is the lowest order of B-tree. Red-black trees are a more efficient representation; i say "representation" because for every 2-3-4 tree, there exists a red-black tree with the nodes in the same order. Reply Leave a Reply Name (required) Email (required) Website ```
``` ```
``` Scientopia Blogs Select a Blog Adventures in Ethics and Science Attack Polymerase Balanced Instability Book of Trogool Candid Engineer in Academia Chemical BiLOLogy Child's Play Christina's LIS Rant Complex Roots Data Hound The Difference Engine Drugmonkey Eagles RISE Everyday Biology Fumbling Towards Tenure Track Galactic Interactions Guest Blog Good Math/Bad Math Hardtack and Sardines InBabyAttachMode Mistress of the Animals Neurodynamics Neurotic Physiology Pondering Blather Professor in Training Prof-Like Substance Sanitized for Your Protection Science Professor Skulls in the Stars Suburban Stone Age Take it to the Bridge Tales of the Genomic Repairman The Hermitage The Meandering Scholar The Questionable Authority The Urban Ethnographer This Scientific Life Thus Spake Zuska Voltage Gate White Coat Underground WhizBANG /* <![CDATA[ */ var blogsdropdown = document.getElementById("sciblogs"); function onBlogChange() { if ( blogsdropdown.options[blogsdropdown.selectedIndex].value != "" ) { location.href = "http://scientopia.org" + blogsdropdown.options[blogsdropdown.selectedIndex].value; } } blogsdropdown.onchange = onBlogChange; /* ]]> */ <!-- google_ad_client = "ca-pub-2721062361340994"; /* Scientopia Sidebar */ google_ad_slot = "2270592141"; google_ad_width = 120; google_ad_height = 600; //--> Recent Posts Farewell, Scientopia! Hello goodmath.org! Hello World in ARM Assembly Language Everyone stop implementing programming languages, right now! It's been solved! On outing in the sciblogging community Oy Veh! Power Series, Analytic Continuations, and Riemann Zeta Recent Comments E. E. Escultura on E. E. Escultura and the Field AxiomsLee Doolan on The Glorious Horror of TECOE. E. Escultura on Turing Crackpottery!E. E. Escultura on Turing Crackpottery!E. E. Escultura on E. E. Escultura and the Field Axioms Archives February 2014 January 2014 December 2013 November 2013 October 2013 September 2013 August 2013 July 2013 June 2013 May 2013 April 2013 March 2013 February 2013 January 2013 December 2012 November 2012 October 2012 September 2012 August 2012 July 2012 June 2012 May 2012 April 2012 March 2012 February 2012 January 2012 December 2011 November 2011 October 2011 September 2011 August 2011 July 2011 June 2011 May 2011 April 2011 March 2011 February 2011 January 2011 December 2010 November 2010 October 2010 September 2010 August 2010 July 2010 June 2010 May 2010 April 2010 March 2010 February 2010 January 2010 December 2009 November 2009 October 2009 September 2009 August 2009 July 2009 June 2009 May 2009 April 2009 March 2009 February 2009 January 2009 December 2008 November 2008 October 2008 September 2008 August 2008 July 2008 June 2008 May 2008 April 2008 March 2008 February 2008 January 2008 December 2007 November 2007 October 2007 September 2007 August 2007 July 2007 June 2007 May 2007 April 2007 March 2007 February 2007 January 2007 December 2006 November 2006 October 2006 September 2006 August 2006 July 2006 June 2006 Categories Abstract Algebra Bad Algebra Bad Economics Bad Logic Bad Math Bad Math Education Bad Physics Bad Probability Bad Software Bad Statistics Basics Book Cantor Crankery Category Theory Chatter classics climate complex analysis Computation Cryptography Data Structures Debunking Creationism Distributed Systems Egnorance Erlang fundamentalism Game Theory Good Math Good Physics goodmath Graph Theory Group Theory Haskell HIV denial Incompleteness Intelligent Design lambda calculus Logic Machine Language manual computing devices Meta Music Numbers Numerology Obfuscatory Math pathological programming People Personal Physics politics Politics probability Programming Recipes sexism Society statistics topology Uncategorized Warming woo Meta Log in Entries RSS Comments RSS WordPress.org ```
``` ```