I have a dataframe:
id type val
a1 q 100
a1 v 4
a1 l 17
b1 p 1
b1 j 700
b1 s 3
I want to group by id with keeping column type and summing values in column val. Value in column type must be one with highest val. So desired result must be:
id type val
a1 q 121
b1 j 704
Type for id a1 is q cause it had highest val (100) and b1 is j cause it had highest val (700). How to do that?
CodePudding user response:
You can try this:
df.sort_values(by='val', ascending=False).groupby('id').agg({'type': 'first', 'val': 'sum'})
It gives:
type val
id
a1 q 121
b1 j 704
CodePudding user response:
df_max = df.groupby(['id']).agg({'val': 'max'})[['id', 'type']]
df_sum = df.groupby(['id']).agg({'val': 'sum'})
df_sum.merge(df_max, on='id', how='left')
CodePudding user response:
It sounds like this is what you're after:
import pandas as pd
df = pd.DataFrame([
['a1', 'q', 100],
['a1', 'v', 4],
['a1', 'l', 17],
['b1', 'p', 1],
['b1', 'j', 700],
['b1', 's', 3]
])
df.columns = ['id', 'type', 'val']
print(df.loc[df.groupby('id')['val'].idxmax()])
Result:
id type val
0 a1 q 100
4 b1 j 700
By the way, you say "must be the highest value" but in your desired output, it's actually the sum of those values.
If the sum is the desired result, this might meet your needs:
result = df.loc[df.groupby('id')['val'].idxmax()]
result.set_index('id', inplace=True)
result['val'] = df.groupby('id')['val'].sum()
Result:
type val
id
a1 q 121
b1 j 704
Note that the name of the val
column might now be confusing, depending on the actual application, as it is no longer the val
, but really the sum(val)
or val_sum
, etc.
CodePudding user response:
In your case
out = df.set_index('type').groupby('id').val.agg(['sum','idxmax']).reset_index()
id sum idxmax
0 a1 121 q
1 b1 704 j