Home > Enterprise >  Create dataframe column by grouping a column and sum previous values in another column
Create dataframe column by grouping a column and sum previous values in another column

Time:10-16

I'm looking for a simplier way to create a new column that represents grouping a dataframe by ColA, and for each date in ColB, sum the previous two values in ColC. Example below:

Original DF

 ColA      ColB           ColC
 dog       10/13/2021       2
 dog       10/14/2021       1
 dog       10/15/2021       8
 cat       10/12/2021       2 
 cat       10/13/2021       5
 cat       10/14/2021       6
 cat       10/15/2021       1
 frog      10/12/2021       6
 frog      10/13/2021       9
 frog      10/14/2021       2 
 frog      10/15/2021       4

Desired DF

 ColA      ColB           ColC       NewCol
 dog       10/13/2021       2          NaN
 dog       10/14/2021       1           2
 dog       10/15/2021       8           3
 cat       10/12/2021       2          NaN
 cat       10/13/2021       5           2
 cat       10/14/2021       6           7
 cat       10/15/2021       1           11
 frog      10/12/2021       6          NaN
 frog      10/13/2021       9           6
 frog      10/14/2021       2           15
 frog      10/15/2021       4           11

I know I could do this with looping through the ColA categories and making subset dataframes, etc, etc but would be several lines of code. Any one have a speedier option/idea? thanks.

CodePudding user response:

Use lambda function with rolling and shift like:

f = lambda x: x.rolling(2, min_periods=1).sum().shift()
df['new'] = df.groupby('ColA')['ColC'].apply(f).reset_index(level=0, drop=True)
print (df)
    ColA        ColB  ColC   new
0    dog  10/13/2021     2   NaN
1    dog  10/14/2021     1   2.0
2    dog  10/15/2021     8   3.0
3    cat  10/12/2021     2   NaN
4    cat  10/13/2021     5   2.0
5    cat  10/14/2021     6   7.0
6    cat  10/15/2021     1  11.0
7   frog  10/12/2021     6   NaN
8   frog  10/13/2021     9   6.0
9   frog  10/14/2021     2  15.0
10  frog  10/15/2021     4  11.0

Or with double groupby like:

df['new'] = (df.groupby('ColA')['ColC'].rolling(2, min_periods=1).sum()
               .groupby(level=0).shift()
               .reset_index(level=0, drop=True))
print (df)
    ColA        ColB  ColC   new
0    dog  10/13/2021     2   NaN
1    dog  10/14/2021     1   2.0
2    dog  10/15/2021     8   3.0
3    cat  10/12/2021     2   NaN
4    cat  10/13/2021     5   2.0
5    cat  10/14/2021     6   7.0
6    cat  10/15/2021     1  11.0
7   frog  10/12/2021     6   NaN
8   frog  10/13/2021     9   6.0
9   frog  10/14/2021     2  15.0
10  frog  10/15/2021     4  11.0

CodePudding user response:

Try rolling with lazy groupby:

g = df.groupby('ColA')

df['ColD'] = (g['ColC'].rolling(2, min_period=1).sum()
                .reset_index(level=0, drop=True)
             )
df['ColD'] = g['ColD'].shift()

Output:

    ColA        ColB  ColC  ColD
0    dog  10/13/2021     2   NaN
1    dog  10/14/2021     1   2.0
2    dog  10/15/2021     8   3.0
3    cat  10/12/2021     2   NaN
4    cat  10/13/2021     5   2.0
5    cat  10/14/2021     6   7.0
6    cat  10/15/2021     1  11.0
7   frog  10/12/2021     6   NaN
8   frog  10/13/2021     9   6.0
9   frog  10/14/2021     2  15.0
10  frog  10/15/2021     4  11.0
  • Related