Home > Enterprise >  Python - pandas, group by and max count
Python - pandas, group by and max count

Time:12-07

I need the most similar (max count) from column cluster-1 from column cluster-2.

Input - data

Input data

Output - data

output

And if i wanted to output like this? So how do I do it?

Output2 - data

output-2

I use the command: df.groupby(['cluster-1','cluster-2'])['cluster-2'].count() this command will give me count per occurrence in the column cluster-2. I need advice on how to proceed, thanks.

CodePudding user response:

Use SeriesGroupBy.value_counts because by default sorted values, so possible convert MultiIndex to DataFrame by MultiIndex.to_frame and then remove duplicates by cluster-1 in DataFrame.drop_duplicates:

df1 = (df.groupby(['cluster-1'])['cluster-2']
         .value_counts()
         .index
         .to_frame(index=False)
         .drop_duplicates('cluster-1'))
  • Related