Home > Blockchain >  Should B-Tree nodes contain a pointer to their parent (C implementation)?
Should B-Tree nodes contain a pointer to their parent (C implementation)?

Time:11-28

I am trying to implement a B-tree and from what I understand this is how you split a node:

  1. Attempt to insert a new value V at a leaf node N
  2. If the leaf node has no space, create a new node and pick a middle value of N and anything right of it move to the new node and anything to the left of the middle value leave in the old node, but move it left to free up the right indices and insert V in the appropriate of the now two nodes
  3. Insert the middle value we picked into the parent node of N and also add the newly created node to the list of children of the parent of N (thus making N and the new node siblings)
  4. If the parent of N has no free space, perform the same operation and along with the values also split the children between the two split nodes (so this last part applies only to non-leaf nodes)
  5. Continue trying to insert the previous split's middle point into the parent until you reach the root and potentially split the root itself, making a new root

This brings me to the question - how do I traverse upwards, am I supposed to keep a pointer of the parent?
Because I can only know if I have to split the leaf node when I have reached it for insertion. So once I have to split it, I have to somehow go back to its parent and if I have to split the parent as well, I have to keep going back up.
Otherwise I would have to re-traverse the tree again and again each time to find the next parent.

Here is an example of my node class:

template<typename KEY, typename VALUE, int DEGREE>
struct BNode
{
    KEY Keys[DEGREE];
    VALUE Values[DEGREE];

    BNode<KEY, VALUE, DEGREE>* Children[DEGREE   1];
    BNode<KEY, VALUE, DEGREE>* Parent;

    bool IsLeaf;
};

(Maybe I should not have an IsLeaf field and instead just check if it has any children, to save space)

CodePudding user response:

You don't need parent pointers if all your operations start at the root.

I usually code the insert recursively, such that calling node.insert(key) either returns null or a new key to insert at its parent's level. The insert starts with root.insert(key), which finds the appropriate child and calls child.insert(key).

When a leaf node is reached the insert is performed, and non-null is returned if the leaf splits. The parent would then insert the new internal key and return non-null if it splits, etc. If root.insert(key) returns non-null, then it's time to make a new root

CodePudding user response:

Even if you don't use recursion or an explicit stack while going down the tree, you can still do it without parent pointers if you split nodes a bit sooner with a slightly modified algorithm, which has this key characteristic:

When encountering a node that is full, split it, even when it is not a leaf.

With this pre-emptive splitting algorithm, you only need to keep a reference to the immediate parent (not any other ancestor) to make the split possible, since now it is guaranteed that a split will not lead to another, cascading split more upwards in the tree. This algorithm requires that the maximum degree (number of children) of the B-tree is even (as otherwise one of the two split nodes would have too few keys to be considered valid).

See also Wikipedia which describes this alternative algorithm as follows:

An alternative algorithm supports a single pass down the tree from the root to the node where the insertion will take place, splitting any full nodes encountered on the way preemptively. This prevents the need to recall the parent nodes into memory, which may be expensive if the nodes are on secondary storage. However, to use this algorithm, we must be able to send one element to the parent and split the remaining

  • Related