Home > Enterprise >  Groupby with max on string dtype and keep all columns
Groupby with max on string dtype and keep all columns

Time:05-26

Trying to group by DocumentNo and return the Max of the Concat column. However I want the Revision column (any other columns I add to this df) to be included in final output.

This is an example input:

DocumentNo                 Revision  Concat
MAOIE-2100-CI-MTO-0001     4         MAOIE-2100-CI-MTO-00014
MAOIE-5100-EL-MTO-0005     3         MAOIE-5100-EL-MTO-00053
MAOIE-2100-CI-MTO-0001     5         MAOIE-2100-CI-MTO-00015

This should be the output:

DocumentNo                 Revision  Concat
MAOIE-5100-EL-MTO-0005     3         MAOIE-5100-EL-MTO-00053
MAOIE-2100-CI-MTO-0001     5         MAOIE-2100-CI-MTO-00015

I've tried this but I get an argmax error I think because Concat is a string column.

df = df.groupby(['DocumentNo'], as_index=False).agg({'Concat': 'max', 'Revision': 'first'})

This works, but only returns the DocumentNo and Concat columns:

df = df.groupby('DocumentNo')['Concat'].max()

CodePudding user response:

This should works:

df.groupby("DocumentNo").apply(lambda d:d.loc[d["Concat"]==d["Concat"].max()])

Probably a faster way exists.

CodePudding user response:

You can try merge after find the groupby max

df = df.merge(df.groupby('DocumentNo', as_index=False)['Concat'].max(), how='right')
print(df)

               DocumentNo  Revision                   Concat
0  MAOIE-2100-CI-MTO-0001         5  MAOIE-2100-CI-MTO-00015
1  MAOIE-5100-EL-MTO-0005         3  MAOIE-5100-EL-MTO-00053
  • Related