Home > Net >  How to check if an element exists in 60% of the list of lists?
How to check if an element exists in 60% of the list of lists?

Time:07-26

I have a list of 5 lists:

X = [['a','b','c'],['a','d','e'],['a','x','f'],['g','h','j'],['y','u','i']]

I'm trying to find out how to get a list showing the elements that exists in 60% of the lists in X.

So I'd want it to return one element ['a'] because the element 'a' exists within 3 of the 5 lists i.e 'a' exists in 60% of the lists in X.

CodePudding user response:

One approach using collections.Counter:

from collections import Counter
from itertools import chain

X = [['a', 'b', 'c'], ['a', 'd', 'e'], ['a', 'x', 'f'], ['g', 'h', 'j'], ['y', 'u', 'i']]

counts = Counter(chain.from_iterable(set(li) for li in X))
threshold = int(0.6 * len(X))
res = []
for key, count in counts.most_common():
    if count >= threshold:
       res.append(key)
    else:
        break

print(res)

Output

['a']

Note that this solution only counts each item one time per list (set(li)).

CodePudding user response:

You can count each char in all nested lists by the key of char and check if each char > 60% or NOT. You can use defaultdict(int) for initial from zero.

from collections import defaultdict

X = [['a','b','c'],['a','d','e'],['a','x','f'],['g','h','j'],['y','u','i']]

cnt = defaultdict(int)
    
# If you want to count each char only one time in the nested lists
# use : "for l in set(lst):"
# X = [['a','b','c'],['a','a','e'],['b','x','f'],['g','h','j'],['y','u','i']]
# for above example 'a' repeat only 40%
# but if this is not important, Use : "for l in lst:" and this give you 'a' repeats 60% for above example too.

for lst in X:
    for l in set(lst):
        cnt[l]  = 1       
print(cnt)
# {'a': 3, 'b': 1, 'c': 1, 
#  'd': 1, 'e': 1, 'x': 1, 
#  'f': 1, 'g': 1, 'h': 1, 
#  'j': 1, 'y': 1, 'u': 1, 'i': 1}


res = [k for k,v in cnt.items() if v/len(X) >= 0.6]
print(res)
# ['a']

CodePudding user response:

The easiest approach is cycling the unique elements of X in a list-comprehension and check if the given element is present at least one time in at least the 60% of the sub-arrays.

import numpy as np
X =  np.array([["a","b","c"],["a","d","e"],["a","x","f"],["g","h","j"],["y","u","i"]])
[element for element in np.unique(X) if (X==element).any(axis=1).mean()>=.6]
#['a']

CodePudding user response:

One approach without any imports:

counts = {}                               # Counts of each element
for x in X:                               # Iterate through X
    for y in x:                           # Iterate through x
        counts[y] = counts.get(y, 0)   1  # Add 1 to counts[y]

res = []                                  # Output list
for k, v in counts.items():               # Iterate through items of counts
    if v / len(x) >= 0.6:                 # Check if it appears in more than 60% of lists
        res.append(k)                     # If it does, apppend to res

print(res)                                # Output res

Output: ['a']

CodePudding user response:

Here is the solution to get the list:

all_members=set()
for list in X:
    all_members=all_members.union(set(list))

stats={}
frequencies={}
for member in all_members:
    stats[member]=0
    for list in X:
        if member in list:
            stats[member]=stats[member] 1
            frequencies[member]=stats[member]/len(X)
l=[]
for member in frequencies.keys():
    if frequencies[member]>=0.6:
        l.append(member)
print("number of occurrences in X:\n",stats)
print("frequencies:\n",frequencies)
print("list of members occurring more than 60% of times:\n",l)

number of occurrences in X:
 {'i': 1, 'j': 1, 'b': 1, 'd': 1, 'x': 1, 'a': 3, 'y': 1, 'c': 1, 'f': 1, 'e': 1, 'u': 1, 'g': 1, 'h': 1}
frequencies:
 {'i': 0.2, 'j': 0.2, 'b': 0.2, 'd': 0.2, 'x': 0.2, 'a': 0.6, 'y': 0.2, 'c': 0.2, 'f': 0.2, 'e': 0.2, 'u': 0.2, 'g': 0.2, 'h': 0.2}
list of members occurring more than 60% of times:
 ['a']
  • Related