How to check if an element exists in 60% of the list of lists?-CodePudding

I have a list of 5 lists:

X = [['a','b','c'],['a','d','e'],['a','x','f'],['g','h','j'],['y','u','i']]

I'm trying to find out how to get a list showing the elements that exists in 60% of the lists in X.

So I'd want it to return one element ['a'] because the element 'a' exists within 3 of the 5 lists i.e 'a' exists in 60% of the lists in X.

CodePudding user response：

One approach using collections.Counter:

from collections import Counter
from itertools import chain

X = [['a', 'b', 'c'], ['a', 'd', 'e'], ['a', 'x', 'f'], ['g', 'h', 'j'], ['y', 'u', 'i']]

counts = Counter(chain.from_iterable(set(li) for li in X))
threshold = int(0.6 * len(X))
res = []
for key, count in counts.most_common():
    if count >= threshold:
       res.append(key)
    else:
        break

print(res)

Output

['a']

Note that this solution only counts each item one time per list (set(li)).

CodePudding user response：

You can count each char in all nested lists by the key of char and check if each char > 60% or NOT. You can use defaultdict(int) for initial from zero.

from collections import defaultdict

X = [['a','b','c'],['a','d','e'],['a','x','f'],['g','h','j'],['y','u','i']]

cnt = defaultdict(int)
    
# If you want to count each char only one time in the nested lists
# use : "for l in set(lst):"
# X = [['a','b','c'],['a','a','e'],['b','x','f'],['g','h','j'],['y','u','i']]
# for above example 'a' repeat only 40%
# but if this is not important, Use : "for l in lst:" and this give you 'a' repeats 60% for above example too.

for lst in X:
    for l in set(lst):
        cnt[l]  = 1       
print(cnt)
# {'a': 3, 'b': 1, 'c': 1, 
#  'd': 1, 'e': 1, 'x': 1, 
#  'f': 1, 'g': 1, 'h': 1, 
#  'j': 1, 'y': 1, 'u': 1, 'i': 1}


res = [k for k,v in cnt.items() if v/len(X) >= 0.6]
print(res)
# ['a']

CodePudding user response：

The easiest approach is cycling the unique elements of X in a list-comprehension and check if the given element is present at least one time in at least the 60% of the sub-arrays.

import numpy as np
X =  np.array([["a","b","c"],["a","d","e"],["a","x","f"],["g","h","j"],["y","u","i"]])
[element for element in np.unique(X) if (X==element).any(axis=1).mean()>=.6]
#['a']

CodePudding user response：

One approach without any imports:

counts = {}                               # Counts of each element
for x in X:                               # Iterate through X
    for y in x:                           # Iterate through x
        counts[y] = counts.get(y, 0)   1  # Add 1 to counts[y]

res = []                                  # Output list
for k, v in counts.items():               # Iterate through items of counts
    if v / len(x) >= 0.6:                 # Check if it appears in more than 60% of lists
        res.append(k)                     # If it does, apppend to res

print(res)                                # Output res

Output: ['a']

CodePudding user response：

Here is the solution to get the list:

all_members=set()
for list in X:
    all_members=all_members.union(set(list))

stats={}
frequencies={}
for member in all_members:
    stats[member]=0
    for list in X:
        if member in list:
            stats[member]=stats[member] 1
            frequencies[member]=stats[member]/len(X)
l=[]
for member in frequencies.keys():
    if frequencies[member]>=0.6:
        l.append(member)
print("number of occurrences in X:\n",stats)
print("frequencies:\n",frequencies)
print("list of members occurring more than 60% of times:\n",l)

number of occurrences in X:
 {'i': 1, 'j': 1, 'b': 1, 'd': 1, 'x': 1, 'a': 3, 'y': 1, 'c': 1, 'f': 1, 'e': 1, 'u': 1, 'g': 1, 'h': 1}
frequencies:
 {'i': 0.2, 'j': 0.2, 'b': 0.2, 'd': 0.2, 'x': 0.2, 'a': 0.6, 'y': 0.2, 'c': 0.2, 'f': 0.2, 'e': 0.2, 'u': 0.2, 'g': 0.2, 'h': 0.2}
list of members occurring more than 60% of times:
 ['a']