How can i get all unique values and their frequencies in each dataframe column using groupby?
I have tried stats.mode which gives info on the most frequent value which i would like to expand with frequenices of other column values.
from scipy import stats
source = pd.DataFrame({'col1' : ['A', 'A', 'A','A'],
'col2' : ['a', 'a', 'a', 'b'],
'col3' : ['c','c','d','d']})
source.groupby(['col1'])[['col2', 'col3']].agg(stats.mode)
result:
col1
col2 col3
A ([a], [3]) ([c], [2])
I would like to get:
col1
col2 col3
A ([a, b], [3,1]) ([c,d], [2,2])
CodePudding user response:
You need a custom function:
def cust_mode(s):
counts = s.value_counts()
return [tuple(counts.index), tuple(counts)]
out = source.groupby(['col1'])[['col2', 'col3']].agg(cust_mode)
output:
col2 col3
col1
A [(a, b), (3, 1)] [(c, d), (2, 2)]
CodePudding user response:
for k in df.columns:
print(df[k].value_counts())
Gives you what you want. If you want to include nan, dropna=False
between the parenthesis of the value_counts