Home > Enterprise >  Group pandas DataFrame on column and sum it while retaining the number of sumed observations
Group pandas DataFrame on column and sum it while retaining the number of sumed observations

Time:01-16

I have a pandas dataframe that looks like this:

import pandas as pd
df = pd.DataFrame({'id':[1, 1, 2, 2], 'comp': [-0.10,0.20,-0.10, 0.4], 'word': ['boy','girl','man', 'woman']})

I would like to group the dataframe on id, and calculate the sum of corresponding comp as well as get a new column called n_obs that tracks how many rows(ids) were summed up.

I tried using df.groupby('id').sum() but this is not quite producing the results that I want.

I'd like an output on the below form:

id   comp   n_obs
1    0.1    2
2    0.3    2

Any suggestions on how I can do this?

CodePudding user response:

You can use .groupby() with .agg():

df.groupby("id").agg(comp=("comp", "sum"), n_obs=("id", "count"))

This outputs:

    comp  n_obs
id
1    0.1      2
2    0.3      2
  • Related