I have a list of 5 lists:
X = [['a','b','c'],['a','d','e'],['a','x','f'],['g','h','j'],['y','u','i']]
I'm trying to find out how to get a list showing the elements that exists in 60% of the lists in X
.
So I'd want it to return one element ['a']
because the element 'a'
exists within 3 of the 5 lists i.e 'a' exists in 60% of the lists in X
.
CodePudding user response:
One approach using collections.Counter
:
from collections import Counter
from itertools import chain
X = [['a', 'b', 'c'], ['a', 'd', 'e'], ['a', 'x', 'f'], ['g', 'h', 'j'], ['y', 'u', 'i']]
counts = Counter(chain.from_iterable(set(li) for li in X))
threshold = int(0.6 * len(X))
res = []
for key, count in counts.most_common():
if count >= threshold:
res.append(key)
else:
break
print(res)
Output
['a']
Note that this solution only counts each item one time per list (set(li)
).
CodePudding user response:
You can count each char in all nested lists by the key of char and check if each char > 60% or NOT. You can use defaultdict(int)
for initial from zero.
from collections import defaultdict
X = [['a','b','c'],['a','d','e'],['a','x','f'],['g','h','j'],['y','u','i']]
cnt = defaultdict(int)
# If you want to count each char only one time in the nested lists
# use : "for l in set(lst):"
# X = [['a','b','c'],['a','a','e'],['b','x','f'],['g','h','j'],['y','u','i']]
# for above example 'a' repeat only 40%
# but if this is not important, Use : "for l in lst:" and this give you 'a' repeats 60% for above example too.
for lst in X:
for l in set(lst):
cnt[l] = 1
print(cnt)
# {'a': 3, 'b': 1, 'c': 1,
# 'd': 1, 'e': 1, 'x': 1,
# 'f': 1, 'g': 1, 'h': 1,
# 'j': 1, 'y': 1, 'u': 1, 'i': 1}
res = [k for k,v in cnt.items() if v/len(X) >= 0.6]
print(res)
# ['a']
CodePudding user response:
The easiest approach is cycling the unique elements of X
in a list-comprehension and check if the given element is present at least one time in at least the 60% of the sub-arrays.
import numpy as np
X = np.array([["a","b","c"],["a","d","e"],["a","x","f"],["g","h","j"],["y","u","i"]])
[element for element in np.unique(X) if (X==element).any(axis=1).mean()>=.6]
#['a']
CodePudding user response:
One approach without any imports:
counts = {} # Counts of each element
for x in X: # Iterate through X
for y in x: # Iterate through x
counts[y] = counts.get(y, 0) 1 # Add 1 to counts[y]
res = [] # Output list
for k, v in counts.items(): # Iterate through items of counts
if v / len(x) >= 0.6: # Check if it appears in more than 60% of lists
res.append(k) # If it does, apppend to res
print(res) # Output res
Output: ['a']
CodePudding user response:
Here is the solution to get the list:
all_members=set()
for list in X:
all_members=all_members.union(set(list))
stats={}
frequencies={}
for member in all_members:
stats[member]=0
for list in X:
if member in list:
stats[member]=stats[member] 1
frequencies[member]=stats[member]/len(X)
l=[]
for member in frequencies.keys():
if frequencies[member]>=0.6:
l.append(member)
print("number of occurrences in X:\n",stats)
print("frequencies:\n",frequencies)
print("list of members occurring more than 60% of times:\n",l)
number of occurrences in X:
{'i': 1, 'j': 1, 'b': 1, 'd': 1, 'x': 1, 'a': 3, 'y': 1, 'c': 1, 'f': 1, 'e': 1, 'u': 1, 'g': 1, 'h': 1}
frequencies:
{'i': 0.2, 'j': 0.2, 'b': 0.2, 'd': 0.2, 'x': 0.2, 'a': 0.6, 'y': 0.2, 'c': 0.2, 'f': 0.2, 'e': 0.2, 'u': 0.2, 'g': 0.2, 'h': 0.2}
list of members occurring more than 60% of times:
['a']