Home > Enterprise >  How to keep column but not include in groupby
How to keep column but not include in groupby

Time:06-16

I have a dataframe:

id   type1         type2    val
a    main       regular      3
a    main       unregular    5
a    main       large        3
b    second     regular      80
b    second     unregular    90

I want to do groupby by columns id type1 and max() by val. but when I do that:

df.groupby(["id","type1"])["val"].max().reset_index()

I get:

id   type1        val

a    main          5
b   second        90

But I want:

id   type1         type2    val

a    main       unregular    5
b    second     unregular    90

How to do that?

CodePudding user response:

You could use the idxmax function to find the index where the maximum val occurs and then map the value of type2 -

out = df.groupby(["id","type1"])["val"].agg(["max", "idxmax"])
out.loc[:, 'type2'] = out['idxmax'].map(df['type2'])

Output

           max  idxmax      type2
id type1                         
a  main      5       1  unregular
b  second   90       4  unregular

CodePudding user response:

I updated the answer. How about this

df.groupby(['id', 'type1']).aggregate({'val':'max'}).merge(df, on=['id', 'type1', 'val'], how='inner')

result looks like this

  id   type1  val      type2
0  a    main    5  unregular
1  b  second   90  unregular
  • Related