Hello I'm trying to figure it out a way to count the number of times that a subset appears in a list of lists, for example if I have the folloing list
dataset = [[0,0,1,0,1,0],[0,0,1,0,1,1],[1,0,1,0,1,0],[0,1,1,0,0,0]]
The pattern [0,0,1,0,1,0] repeats in three of the four items of the list, I want to be able to count the number of times that the pattern appears.
So far I've tried this but It does not work
subsets_count = []
for i in range(len(dataset)):
current_subset_count = 0
for j in range(len(dataset)):
if dataset[i] in dataset[j]:
subset_count = 1
subsets_count.append(current_subset_count)
I'll really apreciate any intuition or help of how to solve it, thanks!
CodePudding user response:
For each sublist, generate a set of indices where the ones exist. Do the same for the pattern. Then, for each set of indices, find whether the pattern indices are a subset of that set. If so, the pattern is in the sublist.
one_indices_of_subsets = [{i for i, v in enumerate(sublist) if v} for sublist in dataset]
pattern_indices = {i for i, v in enumerate(pattern) if v}
result = sum(1 for s in one_indices_of_subsets if pattern_indices <= s)
print(result)
This outputs:
3
CodePudding user response:
if you want to count a pattern (by taking into account the order of the pattern) you can simply use the .count()
function by applying it as follows:
dataset = [[0,0,1,0,1,0],[0,0,1,0,1,1],[1,0,1,0,1,0],[0,1,1,0,0,0],[0,0,1,0,1,0]]
num_count = dataset.count([0,0,1,0,1,0])
print(num_count)
output:
2
and if you dont care about the order of the 0's and ones, you can use:
dataset = [[0,0,1,0,1,0],[0,0,1,0,1,1],[1,0,1,0,1,0],[0,1,1,0,0,0],[0,0,1,0,1,0]]
num_count = [sum(el) for el in dataset].count(sum([0,0,1,0,1,0]))
print(num_count)
output2:
3
CodePudding user response:
Try:
dataset = [
[0, 0, 1, 0, 1, 0],
[0, 0, 1, 0, 1, 1],
[1, 0, 1, 0, 1, 0],
[0, 1, 1, 0, 0, 0],
]
pat = [0, 0, 1, 0, 1, 0]
cnt = sum(all(a == b for a, b in zip(pat, d) if a == 1) for d in dataset)
print(cnt)
Prints:
3
CodePudding user response:
This allows for one digit to be different from the pattern. Straight forward pattern matcher:
dataset = [[0,0,1,0,1,0],[0,0,1,0,1,1],[1,0,1,0,1,0],[0,1,1,0,0,0]]
pattern = [0,0,1,0,1,0]
m = len(pattern)
subsets_count = 0
for i in range(len(dataset)):
count = 0
for j in range(m):
if dataset[i][j] == pattern[j]:
count =1
if count >= m-1:
subsets_count =1
print(subsets_count)
Output: 3
CodePudding user response:
Using one of my favorite itertools, compress:
[sum(all(compress(e, d)) for e in dataset)
for d in dataset]
Results in (Try it online!):
[3, 1, 1, 1]