Home > OS >  Finding Closest Pair
Finding Closest Pair

Time:02-15

I have an AVL tree, in which I have to find the closest pair, as in the values of the two nodes that have the least difference. No duplicate values, and has to be completed under O(log n).

Example: Insert(9),Insert(2),Insert(14), Insert(10).

The closest pair in this tree is (9,10).

But I am unsure how to implement this.

I only know that, each node's closest pair can be calculated by by taking the minimum of the largest value on the left and the node, or the smallest value on the right. But if I were to calculate this for every node it will be over logn for sure.

Any ideas?

Edit: forgot to mention that I am designing the insert function myself, so I can make changes to each node so it can contain more info, in regard of the closest pair function

CodePudding user response:

An important observation is that the nodes forming the "best" pair cannot belong to the different subtrees - at any level. Obviously, each of them is closer to the common ancestor than to each other. It means that the best pair is always formed by a certain node and some of its descendants.

Therefore, the newly inserted node may improve the record only along its search path. It is also important to keep in mind that this newcomer always ends up as a leaf.

In pseudocode:

node * best[2];

node * insert(node * root, int value)
{
    node * n = new_node(value);
    root = normal_avl_insert(root, n);
    update_best(root, n);
    return root;
}

void update_best(node * root, node * n)
{
    int current_record = abs(best[0]->value - best[1]->value);
    while (root->value != n->value) {
        if (abs(root->value - n->value) < current_record) {
            best[0] = root;
            best[1] = node;
        }

        if (n->value < root->value) {
            root = root->left;
        } else {
            root = root->right;
        }
    }
}

Now the query is answered in a constant time. The insertion is still logarithmic, although with a worst asymptotic constant. Perhaps, since you only need to answer query in the logarithmic time, this solution can be improved.

CodePudding user response:

Edit: forgot to mention that I am designing the insert function myself, so I can make changes to each node so it can contain more info, in regard of the closest pair function

That changes everything. If you have no restrictions on the other functions, it's quite easy.

Have two AVL trees, A and B. Your original A stores the value as the value, and the second B tree has the closest pair as data. Apart from storing the closest pair as the value in B, it also stores a pointer to corresponding node in A. Some incomplete code:

struct Anode {
    struct Anode *left, *right;
    int value;
};

struct Bnode {
    struct Bnode *left, *right;
    int value;
    struct Anode *anode;
}

So finding the scp is just finding the smallest element in B. That operation is done in O(log n). And then you have the data ready via the pointer to A tree.

struct Anode *getSmallestSCP(struct Bnode *btree) {
    if(!btree) return NULL;

    if(!(btree->left)) return btree->anode;

    struct Bnode *scp = getSmallestSCP(btree->left);

    return scp->anode;
}

This will mean that every insert and delete is done twice. This does not affect time complexity for worst case. Possibly the average case could be effected. However, I'm not sure the recalculation of B can be done in O(log n). It's quite possible. I simply don't know. But you stated that the problem was finding the scp in O(log n) which this satisfies.

It will also use around twice as much memory. That means that the space complexity is also unaffected.

  • Related