How to loop through and return any value if it is found inside any other column, and store it in a list using pandas? It doesn't matter how many times it is found, just that it is found at least one more time in a different column. If the value has repeated within the same column, it's not included in the list. Each value must be compared to every other value except from within the same column, if that makes sense.
combined_insp = [] test_df = pd.DataFrame({'area_1': ['John', 'Mike', 'Mary', 'Sarah'], 'area_2': ['John', 'Bob', 'Mary', 'Mary'], 'area_3': ['Jane', 'Sarah', 'David', 'Michael'], 'area_4': ['Diana', 'Mike', 'Bill', 'Bill']})
Expected output would be
combined_insp = ['John', 'Mary', 'Sarah', 'Mike']
CodePudding user response:
A solution with itertools
and set algebra:
from itertools import combinations
combined_insp = set.union(*[set(test_df[c1]).intersection(test_df[c2])
for (c1, c2) in combinations(test_df.columns, 2)])
For each unique combination of columns we take the intersection of the values. Then we take the union of all the results.
CodePudding user response:
You can use pandas.apply(set)
for removing duplicated elements in each list. Then You can use itertools.chain.from_iterable
to flatten all elements to one list. At the end, you can use collections.Counter
for counting elements and returning elements that have count > 1
. (type of the result of Counter
is dict
and you can iterate over dict
with dict.items()
.)
from itertools import chain
from collections import Counter
combined_insp = [k for k,v in Counter(chain.from_iterable(test_df.apply(set))).items() if v>1]
print(combined_insp)
['Sarah', 'Mike', 'Mary', 'John']
CodePudding user response:
here is one way to do it
# pd.melt to flatted the table, then use groupby and take the names that appear more than once
g=df.melt(value_name='area').drop_duplicates().groupby('area')
[key for key, group in g if (group.count() > 1).all() ]
['John', 'Mary', 'Mike', 'Sarah']
CodePudding user response:
counts = df.melt().drop_duplicates()['value'].value_counts()
answer = counts[counts > 1].index.to_list()