Home > Software design >  Is this Union Find really O(n) as they claim?
Is this Union Find really O(n) as they claim?

Time:03-13

I am solving a problem on LeetCode:

Given an unsorted array of integers nums, return the length of the longest consecutive elements sequence. You must write an algorithm that runs in O(n) time. So for nums = [100,4,200,1,3,2], the output is 4.

The Union Find solution to solve this is as below:

class Solution {
public:
    vector<int> parent, sz;
    
    int find(int i) {
        if(parent[i]==i) return i;
        return parent[i]=find(parent[i]);
    }
    
    void merge(int i, int j) {
        int p1=find(i);
        int p2=find(j);
        
        if(p1==p2) return;
        if(sz[p1]>sz[p2]) {
            sz[p1] =sz[p2];
            parent[p2]=p1;
        } else {
            sz[p2] =sz[p1];
            parent[p1]=p2;
        }
    }

    int longestConsecutive(vector<int>& nums) {
        sz.resize(nums.size(),1);
        parent.resize(nums.size(),0);

        iota(begin(parent),end(parent),0);

        unordered_map<int, int> m;

        for(int i=0; i<nums.size(); i  ) {
            int n=nums[i];
            if(m.count(n)) continue;
            if(m.count(n-1)) merge(i,m[n-1]);
            if(m.count(n 1)) merge(i,m[n 1]);
            m[n]=i;
        }

        int res=0;
        for(int i=0; i<parent.size(); i  ) {
            if(parent[i]==i && sz[i]>res) {
                res=sz[i];
            }
        }

        return res;
    }
};

This gets accepted by the OJ (Runtime: 80 ms, faster than 76.03% of C online submissions for Longest Consecutive Sequence), but is this really O(n), as claimed by many answers, such as this one? My understanding is that Union Find is an O(NlogN) algorithm.

Are they right? Or, am I missing something?

CodePudding user response:

The are right. A properly implemented Union Find with path compression has an amortized linear run time complexity.

The key part is path compression and it happens here in your code:

return parent[i]=find(parent[i])

vs the following that doesn't employ path compression:

return find(parent[i])

What this part of the code does is that it flattens the structure of the nodes in the hierarchy and links each node directly to the final root. Only in the first run of find will you traverse the whole structure. The next time you'll get a direct hit since you set the node's parent to its ultimate root. Notice that the second code snippet works perfectly fine, but it just does redundant work when you are not interested in the path itself and only in the final root.

  • Related