Home > Blockchain >  What is the space complexity of bitset in this scenario
What is the space complexity of bitset in this scenario

Time:05-31

I am doing a leetcode problem where I have to find the duplicate of an array of size [1-N] inclusive and came upon this solution:

    public int findDuplicate(int[] nums) {
        BitSet bit = new BitSet();
        for(int num : nums) {
            if(!bit.get(num)) {
                bit.set(num);
            } else {
                return num;
            }
        }
        return -1;
    }

The use of bitset here im assuming is similar to using boolean[] to keep track if we saw the current number previously. So my question is what the space complexity is for this? The runtime seems to be O(n) where n is the size of the input array. Would the same be true for the space complexity?

Link to problem : https://leetcode.com/problems/find-the-duplicate-number/

CodePudding user response:

Your Bitset creates an underlying long[] to store the values. Reading the code of Bitset#set, I would say it's safe to say that the array will never be larger than max(nums) / 64 * 2 = max(nums) / 32. Since long has a fixed size, this comes down to O(max(nums)). If nums contains large values, you can do better with a hash map.

I'm trying this out with simple code, and it seems to corroborate my reading of the code.

BitSet bitSet = new BitSet();

bitSet.set(100);
System.out.println(bitSet.toLongArray().length); // 2 (max(nums) / 32 = 3.125)

bitSet.set(64000);
System.out.println(bitSet.toLongArray().length); // 1001 (max(nums) / 32 = 2000)

bitSet.set(100_000);
System.out.println(bitSet.toLongArray().length); // 1563 (max(nums) / 32 = 3125)

Note that the 2 factor I added is conservative, in general it will be a smaller factor, that's why my formula consistently over-estimates the actual length of the long array, but never by more than a factor of 2. This is the code in Bitset that made me add it:

private void ensureCapacity(int wordsRequired) {
    if (words.length < wordsRequired) {
        // Allocate larger of doubled size or required size
        int request = Math.max(2 * words.length, wordsRequired);
        words = Arrays.copyOf(words, request);
        sizeIsSticky = false;
    }
}

In summary, I would say the bit set is only a good idea if you have reason to believe you have smaller values than you have values (count). For example, if you have only two values but they are over a billion in value, you will needlessly allocate an array of several million elements.

Additionally, even in cases where values remain small, this solutions performs poorly for sorted arrays because Bitset#set will always reallocate and copy the array, so your complexity is not linear at all, it's quadratic in max(nums), which can be terrible if max(nums) is very large. To be linear, you would need to first find the maximum, allocate the necessary length in the Bitset, and then only go through the array.

At this point, using a map is simpler and fits all situations. If speed really matters, my bet is that the Bitset will beat a map under specific conditions (lots of values, but small, and by pre-sizing the bit set as described).

  • Related