Home > Mobile >  calculate sum of a column after filtering by and grouping on other columns
calculate sum of a column after filtering by and grouping on other columns

Time:05-31

I use this code to first filter out my dataframe according to a particular column role.

Next, I calculate the sum of the value column and add it to the newColumn. For now, I get the same value in all rows of the newCol. However, now I want to group by the sum per each value of the id col. How can I achieve that?

    mask = chunk["role"].eq("senior")
    chunk.loc[mask, 'newColumn'] = chunk.loc[mask, 'value'].sum()

df

id    role      value
1     junior    2
1     senior    3
1     senior    4
2     junior    2
2     senior    6

Desired Output:

id    role      value     sum
1     junior    2
1     senior    3         7
1     senior    4         7
2     junior    2
2     senior    6         8
2     senior    2         8

So I am basically only considering rows where the role is senior

CodePudding user response:

IIUC, you can try query where role value is senior then use groupby.transform

df['sum'] = (df.query('role == "senior"')
             .groupby('id')['value'].transform('sum'))
print(df)

   id    role  value  sum
0   1  junior      2  NaN
1   1  senior      3  7.0
2   1  senior      4  7.0
3   2  junior      2  NaN
4   2  senior      6  8.0
5   2  senior      2  8.0

CodePudding user response:

You should consider adding samples of your dataframe to make it easier for eveyrone to help. Check my sample below - this is what the df structure should be given as when you're asking question.

Something like this should work:

# Sample of the data frame
>>> df = pd.DataFrame({
       'id': {0: 1, 1: 1, 2: 1, 3: 2, 4: 2},
       'role': {0: 'junior', 1: 'senior', 2: 'senior', 3: 'junior', 4: 'senior'},
       'value': {0: 2, 1: 3, 2: 4, 3: 2, 4: 6}})

# Massaging the data to achieve desired result
>>> print(df
        .merge(df.groupby(['id','role'], as_index=False)[['value']]
        .agg(sum), on=['id','role'])
        .rename({'value_x': 'value', 'value_y': 'sum'}, axis=1))


id  role    value   sum
0   1   junior  2   2
1   1   senior  3   7
2   1   senior  4   7
3   2   junior  2   2
4   2   senior  6   6
  • Related