Home > Mobile >  Create column that groups and transform data from the day before
Create column that groups and transform data from the day before

Time:10-28

I need to create a new column that will group by ColA, and the result will be the summation of values in ColC of the day before -- example below:

Original DF

 ColA     ColB         ColC
  A     10/20/2021      2
  A     10/20/2021      8
  A     10/21/2021      1
  A     10/21/2021      5
  A     10/22/2021      3
  A     10/22/2021      4
  B     10/20/2021      9
  B     10/20/2021      1
  B     10/21/2021      6
  B     10/21/2021      4 
  B     10/22/2021      2
  B     10/22/2021      7 
  C     10/20/2021      6
  C     10/20/2021      3
  C     10/21/2021      8
  C     10/21/2021      8
  C     10/22/2021      2
  C     10/22/2021      6

Desired DF where new column is ColD

 ColA     ColB         ColC      ColD
  A     10/20/2021      2         NA
  A     10/20/2021      8         NA
  A     10/21/2021      1         10
  A     10/21/2021      5         10
  A     10/22/2021      3          6
  A     10/22/2021      4          6
  B     10/20/2021      9         NA
  B     10/20/2021      1         NA
  B     10/21/2021      6         10
  B     10/21/2021      4         10
  B     10/22/2021      2         10
  B     10/22/2021      7         10
  C     10/20/2021      6         NA
  C     10/20/2021      3         NA
  C     10/21/2021      8         9
  C     10/21/2021      8         9
  C     10/22/2021      2         16
  C     10/22/2021      6         16

Any suggestions is appreciated!

CodePudding user response:

Doing a groupby then we shift it within the colA and merge back

out = df.merge(df.groupby(['ColA','ColB'])['ColC'].sum().groupby(level=0).shift().reset_index(name='ColD'))
Out[71]: 
   ColA        ColB  ColC  ColD
0     A  10/20/2021     2   NaN
1     A  10/20/2021     8   NaN
2     A  10/21/2021     1  10.0
3     A  10/21/2021     5  10.0
4     A  10/22/2021     3   6.0
5     A  10/22/2021     4   6.0
6     B  10/20/2021     9   NaN
7     B  10/20/2021     1   NaN
8     B  10/21/2021     6  10.0
9     B  10/21/2021     4  10.0
10    B  10/22/2021     2  10.0
11    B  10/22/2021     7  10.0
12    C  10/20/2021     6   NaN
13    C  10/20/2021     3   NaN
14    C  10/21/2021     8   9.0
15    C  10/21/2021     8   9.0
16    C  10/22/2021     2  16.0
17    C  10/22/2021     6  16.0

CodePudding user response:

First convert your date to a datetime. Then groupby 'ColA' and 'ColB' shifted by exactly one day. This way when you merge the result of the groupby back it aligns on one day earlier.

import pandas as pd

df['ColB'] = pd.to_datetime(df['ColB'], format='%m/%d/%Y')

s = (df.groupby(['ColA', df['ColB'] pd.offsets.DateOffset(days=1)])
       ['ColC'].sum().rename('ColD'))

df = df.merge(s, on=['ColA', 'ColB'], how='left')

print(df)

   ColA       ColB  ColC  ColD
0     A 2021-10-20     2   NaN
1     A 2021-10-20     8   NaN
2     A 2021-10-21     1  10.0
3     A 2021-10-21     5  10.0
4     A 2021-10-22     3   6.0
5     A 2021-10-22     4   6.0
6     B 2021-10-20     9   NaN
7     B 2021-10-20     1   NaN
8     B 2021-10-21     6  10.0
9     B 2021-10-21     4  10.0
10    B 2021-10-22     2  10.0
11    B 2021-10-22     7  10.0
12    C 2021-10-20     6   NaN
13    C 2021-10-20     3   NaN
14    C 2021-10-21     8   9.0
15    C 2021-10-21     8   9.0
16    C 2021-10-22     2  16.0
17    C 2021-10-22     6  16.0
  • Related