Count the number of times that a binary subset appears in a list of lists, Python-CodePudding

Hello I'm trying to figure it out a way to count the number of times that a subset appears in a list of lists, for example if I have the folloing list

dataset = [[0,0,1,0,1,0],[0,0,1,0,1,1],[1,0,1,0,1,0],[0,1,1,0,0,0]]

The pattern [0,0,1,0,1,0] repeats in three of the four items of the list, I want to be able to count the number of times that the pattern appears.

So far I've tried this but It does not work

subsets_count = []
for i in range(len(dataset)):
    current_subset_count = 0
    for j in range(len(dataset)):
        if dataset[i] in dataset[j]:
            subset_count  = 1

    subsets_count.append(current_subset_count)

I'll really apreciate any intuition or help of how to solve it, thanks!

CodePudding user response：

For each sublist, generate a set of indices where the ones exist. Do the same for the pattern. Then, for each set of indices, find whether the pattern indices are a subset of that set. If so, the pattern is in the sublist.

one_indices_of_subsets = [{i for i, v in enumerate(sublist) if v} for sublist in dataset]
pattern_indices = {i for i, v in enumerate(pattern) if v}

result = sum(1 for s in one_indices_of_subsets if pattern_indices <= s)

print(result)

This outputs:

CodePudding user response：

if you want to count a pattern (by taking into account the order of the pattern) you can simply use the .count() function by applying it as follows:

dataset = [[0,0,1,0,1,0],[0,0,1,0,1,1],[1,0,1,0,1,0],[0,1,1,0,0,0],[0,0,1,0,1,0]]

num_count = dataset.count([0,0,1,0,1,0])

print(num_count)

output:

and if you dont care about the order of the 0's and ones, you can use:

dataset = [[0,0,1,0,1,0],[0,0,1,0,1,1],[1,0,1,0,1,0],[0,1,1,0,0,0],[0,0,1,0,1,0]]

num_count = [sum(el) for el in dataset].count(sum([0,0,1,0,1,0]))

print(num_count)

output2:

CodePudding user response：

Try:

dataset = [
    [0, 0, 1, 0, 1, 0],
    [0, 0, 1, 0, 1, 1],
    [1, 0, 1, 0, 1, 0],
    [0, 1, 1, 0, 0, 0],
]

pat = [0, 0, 1, 0, 1, 0]

cnt = sum(all(a == b for a, b in zip(pat, d) if a == 1) for d in dataset)
print(cnt)

Prints:

CodePudding user response：

This allows for one digit to be different from the pattern. Straight forward pattern matcher:

dataset = [[0,0,1,0,1,0],[0,0,1,0,1,1],[1,0,1,0,1,0],[0,1,1,0,0,0]]

pattern = [0,0,1,0,1,0]
m = len(pattern)
subsets_count = 0

for i in range(len(dataset)):
    count = 0
    for j in range(m):
        if dataset[i][j] == pattern[j]:
            count  =1
    if count >= m-1:
        subsets_count  =1

print(subsets_count)

Output: 3

CodePudding user response：

Using one of my favorite itertools, compress:

[sum(all(compress(e, d)) for e in dataset)
 for d in dataset]

Results in (Try it online!):

[3, 1, 1, 1]