Creating a data structure of integers and finding which component a given integer lies in-CodePudding

I have a set of 32 bit integers with which I will be making a data structure that will put these integers in different components ( like dsu ) the join operation has to be performed using bitwise and, if the bitwise and is not zero of two integers they lie in the same component, now given an integer how do I get ( assuming the data structure is already created for the prefix of this current integer ) the component in which this integer will lie? ex say [1,5,3,8] are given set of integers than (1,5,8) lie in one component because lsb for any pair will ensure non zero "bit and" and (8) will lie in it's own component, Or quite possibly entirely different algorithm that is much simpler if you can think of,

Edit :

It is possible however that two integers have bit and as zero but if third integer is introduced which has pairwise bit and non zero with both of them then all three will lie in same component ex [2,12, 6] 2 and 12 on operation give 0 but 2 and 6 do not and 12 and 6 do not so 6 will unify all three in one component.

Edit 2 : I am able to think of a naive algorithm that uses 32 length array for which one entry consists of linked list like structure ( like adjecentcy list ) that is if i'th bit is set then index i will contain all the integers with i'th set bit, but it could give me O(n) search, I'm thinking more towards dsu, but finding it difficult how to create it.

CodePudding user response：

The requirements imply that an incoming integer x should be joined with component C if and only if the bitwise AND of x with the (bitwise OR of all numbers in C) is nonzero.

A disjoint-set data structure would work well for this. The 'root' of each component could also maintain the bitwise OR of all of its members, which can be done with a separate hashmap, although this isn't strictly necessary.

You should also keep a hashmap from each bit-position to any member of the (unique, if it exists) component that has that bit-position set. When adding a new integer, scan over all of its set-bits: use this hashmap to find all the components to union, or whether to make a new component. This should give almost constant-time updates. There are also at most 1 ceil(log_2(M)) different components at any time, where M is the largest integer you've seen, which would be at most 33.

If you want to avoid some of the complexity of the Disjoint-Set code, you could go with Paul Hankin's suggestion of a list of components. Depending on your programming language and whether you want to count duplicates, a component could be represented as a (bitmask, multiset) pair, where the bitmask is the bitwise OR of all members, and the multiset of members lets you print all members of a component and test for containment in O(1) time, although insertions could take longer (but still constant amortized time).

If you never need to print the members of a component or check whether a particular integer was ever actually added, it suffices to just store the bitmasks.

CodePudding user response：

Here's some go code (runnable link) that represents components as a list of disjoint bitmasks. A previously seen integer is in the component that it has non-zero bitwise AND with (or the zero component if it is zero).

The list of components can be updated with a new integer by finding all components that it has non-zero bitwise AND with, and replacing them with a single component that is the bitwise OR of those components and the new element.

There is at most one component per bit in any input value, plus one if there's a zero component. That's 32 components at most here, because the input values are stated to be 32-bits.

package main

import (
    "fmt"
)

// makeComponents returns a slice of disjoint bitmasks
// that represent the different components in the input.
func makeComponents(xs []uint32) []uint32 {
    var cs []uint32
    for _, x := range xs {
        j := 0
        for i, c := range cs {
            cs[j] = cs[i]
            if c & x != 0 || (x == 0 && c == 0) {
                x |= c
            } else {
                j  
            }
        }
        cs = append(cs[:j], x)
    }
    return cs
}

// component returns the component (from cs) that x is in.
func component(cs []uint32, x uint32) int {
    for i, c := range cs {
        if x==c || x & c != 0 {
            return i
        }
    }
    panic("no component")
}

// showComponents outputs the input values with those in the same
// component on the same line.
func showComponents(xs []uint32) {
    cs := makeComponents(xs)
    xperc := make([][]uint32, len(cs))
    for _, x := range xs {
        c := component(cs, x)
        xperc[c] = append(xperc[c], x)
    }
    for _, xc := range xperc {
        fmt.Println(xc)
    }
}

func main() {
    xs := []uint32{0, 1, 3, 6, 8, 16, 48, 65}
    showComponents(xs)
}