I have this setup of a pandas dataframe (or a csv file):
colA colB colC colD
aa aa aa aa
bb bb bb bb
cc dd cc ee
dd qq ee cc
ee ee dd dd
The goal here is to get a list/column with the set of values that appear in all the columns or in other words the entries that are common to all the columns.
Required output:
col
aa
bb
dd
ee
or a output with the common value's list:
common_list = ['aa', 'bb', 'dd', 'ee']
I have a silly solution (but it doesn't seem to be correct as I am not getting what I want when implemented to my dataframe)
import pandas as pd
df = pd.read_csv('Bus Names Concat test.csv') #i/p csv file (pandas df converted into csv)
df = df.stack().value_counts()
core_list = df[df>2].index.tolist() #defining the common list as core list
print(len(core_list))
df_core = pd.DataFrame(core_list)
print(df_core)
Any help/sugggestion/feedback to get the required o/p will be appreciated.
CodePudding user response:
You can use .intersection()
method of sets to find common values between sets of each column:
# wrapped in a list, take first column set and pass sets of other columns as arguments
common_list = list(set(df.colA).intersection(set(df.colB), set(df.colC), set(df.colD)))
sorted(common_list) # needs sorting in alphabetical order
Output:
['aa', 'bb', 'dd', 'ee']
CodePudding user response:
In your case
s = df.melt().groupby('value')['variable'].nunique()
outlist = s[s==4].index.tolist()
Out[307]: ['aa', 'bb', 'dd', 'ee']
CodePudding user response:
common_list = sorted(list(set.intersection(*map(set, df.values.transpose().tolist()))))
print(common_list)
['aa', 'bb', 'dd', 'ee']