Home > Software design >  How to replace value while grouping with specific one?
How to replace value while grouping with specific one?

Time:11-11

I have a dataframe:

id     type     val   
a1      q       100
a1      v       4
a1      l       17
b1      p       1
b1      j       700
b1      s       3

I want to group by id with keeping column type and summing values in column val. Value in column type must be one with highest val. So desired result must be:

id     type     val   
a1      q       121
b1      j       704

Type for id a1 is q cause it had highest val (100) and b1 is j cause it had highest val (700). How to do that?

CodePudding user response:

You can try this:

df.sort_values(by='val', ascending=False).groupby('id').agg({'type': 'first', 'val': 'sum'})

It gives:

   type  val
id          
a1    q  121
b1    j  704

CodePudding user response:

df_max = df.groupby(['id']).agg({'val': 'max'})[['id', 'type']]

df_sum = df.groupby(['id']).agg({'val': 'sum'})

df_sum.merge(df_max, on='id', how='left')

CodePudding user response:

It sounds like this is what you're after:

import pandas as pd

df = pd.DataFrame([
    ['a1', 'q', 100],
    ['a1', 'v', 4],
    ['a1', 'l', 17],
    ['b1', 'p', 1],
    ['b1', 'j', 700],
    ['b1', 's', 3]
])
df.columns = ['id', 'type', 'val']

print(df.loc[df.groupby('id')['val'].idxmax()])

Result:

   id type  val
0  a1    q  100
4  b1    j  700

By the way, you say "must be the highest value" but in your desired output, it's actually the sum of those values.

If the sum is the desired result, this might meet your needs:

result = df.loc[df.groupby('id')['val'].idxmax()]
result.set_index('id', inplace=True)
result['val'] = df.groupby('id')['val'].sum()

Result:

   type  val
id          
a1    q  121
b1    j  704

Note that the name of the val column might now be confusing, depending on the actual application, as it is no longer the val, but really the sum(val) or val_sum, etc.

CodePudding user response:

In your case

out = df.set_index('type').groupby('id').val.agg(['sum','idxmax']).reset_index()
   id  sum idxmax
0  a1  121      q
1  b1  704      j
  • Related