Get the list of values/entries that are common to all the columns in a dataframe (or a csv file) in-CodePudding

I have this setup of a pandas dataframe (or a csv file):

colA colB colC colD
 aa   aa   aa   aa
 bb   bb   bb   bb
 cc   dd   cc   ee
 dd   qq   ee   cc
 ee   ee   dd   dd

The goal here is to get a list/column with the set of values that appear in all the columns or in other words the entries that are common to all the columns.

Required output:

col
aa
bb
dd
ee

or a output with the common value's list:

common_list = ['aa', 'bb', 'dd', 'ee']

I have a silly solution (but it doesn't seem to be correct as I am not getting what I want when implemented to my dataframe)

import pandas as pd

df = pd.read_csv('Bus Names Concat test.csv') #i/p csv file (pandas df converted into csv)

df = df.stack().value_counts() 
core_list = df[df>2].index.tolist() #defining the common list as core list
print(len(core_list))
df_core = pd.DataFrame(core_list)

print(df_core)

Any help/sugggestion/feedback to get the required o/p will be appreciated.

CodePudding user response：

You can use .intersection() method of sets to find common values between sets of each column:

# wrapped in a list, take first column set and pass sets of other columns as arguments
common_list = list(set(df.colA).intersection(set(df.colB), set(df.colC), set(df.colD)))
sorted(common_list)  # needs sorting in alphabetical order

Output:

['aa', 'bb', 'dd', 'ee']

CodePudding user response：

In your case

s = df.melt().groupby('value')['variable'].nunique()
outlist = s[s==4].index.tolist()
Out[307]: ['aa', 'bb', 'dd', 'ee']

CodePudding user response：

common_list = sorted(list(set.intersection(*map(set, df.values.transpose().tolist()))))
print(common_list)

['aa', 'bb', 'dd', 'ee']