Trying to group by DocumentNo
and return the Max
of the Concat
column. However I want the Revision
column (any other columns I add to this df) to be included in final output.
This is an example input:
DocumentNo Revision Concat
MAOIE-2100-CI-MTO-0001 4 MAOIE-2100-CI-MTO-00014
MAOIE-5100-EL-MTO-0005 3 MAOIE-5100-EL-MTO-00053
MAOIE-2100-CI-MTO-0001 5 MAOIE-2100-CI-MTO-00015
This should be the output:
DocumentNo Revision Concat
MAOIE-5100-EL-MTO-0005 3 MAOIE-5100-EL-MTO-00053
MAOIE-2100-CI-MTO-0001 5 MAOIE-2100-CI-MTO-00015
I've tried this but I get an argmax
error I think because Concat
is a string column.
df = df.groupby(['DocumentNo'], as_index=False).agg({'Concat': 'max', 'Revision': 'first'})
This works, but only returns the DocumentNo
and Concat
columns:
df = df.groupby('DocumentNo')['Concat'].max()
CodePudding user response:
This should works:
df.groupby("DocumentNo").apply(lambda d:d.loc[d["Concat"]==d["Concat"].max()])
Probably a faster way exists.
CodePudding user response:
You can try merge
after find the groupby max
df = df.merge(df.groupby('DocumentNo', as_index=False)['Concat'].max(), how='right')
print(df)
DocumentNo Revision Concat
0 MAOIE-2100-CI-MTO-0001 5 MAOIE-2100-CI-MTO-00015
1 MAOIE-5100-EL-MTO-0005 3 MAOIE-5100-EL-MTO-00053