I need to create a new column that will group by ColA, and the result will be the summation of values in ColC of the day before -- example below:
Original DF
ColA ColB ColC
A 10/20/2021 2
A 10/20/2021 8
A 10/21/2021 1
A 10/21/2021 5
A 10/22/2021 3
A 10/22/2021 4
B 10/20/2021 9
B 10/20/2021 1
B 10/21/2021 6
B 10/21/2021 4
B 10/22/2021 2
B 10/22/2021 7
C 10/20/2021 6
C 10/20/2021 3
C 10/21/2021 8
C 10/21/2021 8
C 10/22/2021 2
C 10/22/2021 6
Desired DF where new column is ColD
ColA ColB ColC ColD
A 10/20/2021 2 NA
A 10/20/2021 8 NA
A 10/21/2021 1 10
A 10/21/2021 5 10
A 10/22/2021 3 6
A 10/22/2021 4 6
B 10/20/2021 9 NA
B 10/20/2021 1 NA
B 10/21/2021 6 10
B 10/21/2021 4 10
B 10/22/2021 2 10
B 10/22/2021 7 10
C 10/20/2021 6 NA
C 10/20/2021 3 NA
C 10/21/2021 8 9
C 10/21/2021 8 9
C 10/22/2021 2 16
C 10/22/2021 6 16
Any suggestions is appreciated!
CodePudding user response:
Doing a groupby
then we shift
it within the colA and merge
back
out = df.merge(df.groupby(['ColA','ColB'])['ColC'].sum().groupby(level=0).shift().reset_index(name='ColD'))
Out[71]:
ColA ColB ColC ColD
0 A 10/20/2021 2 NaN
1 A 10/20/2021 8 NaN
2 A 10/21/2021 1 10.0
3 A 10/21/2021 5 10.0
4 A 10/22/2021 3 6.0
5 A 10/22/2021 4 6.0
6 B 10/20/2021 9 NaN
7 B 10/20/2021 1 NaN
8 B 10/21/2021 6 10.0
9 B 10/21/2021 4 10.0
10 B 10/22/2021 2 10.0
11 B 10/22/2021 7 10.0
12 C 10/20/2021 6 NaN
13 C 10/20/2021 3 NaN
14 C 10/21/2021 8 9.0
15 C 10/21/2021 8 9.0
16 C 10/22/2021 2 16.0
17 C 10/22/2021 6 16.0
CodePudding user response:
First convert your date to a datetime
. Then groupby 'ColA' and 'ColB' shifted by exactly one day. This way when you merge the result of the groupby
back it aligns on one day earlier.
import pandas as pd
df['ColB'] = pd.to_datetime(df['ColB'], format='%m/%d/%Y')
s = (df.groupby(['ColA', df['ColB'] pd.offsets.DateOffset(days=1)])
['ColC'].sum().rename('ColD'))
df = df.merge(s, on=['ColA', 'ColB'], how='left')
print(df)
ColA ColB ColC ColD
0 A 2021-10-20 2 NaN
1 A 2021-10-20 8 NaN
2 A 2021-10-21 1 10.0
3 A 2021-10-21 5 10.0
4 A 2021-10-22 3 6.0
5 A 2021-10-22 4 6.0
6 B 2021-10-20 9 NaN
7 B 2021-10-20 1 NaN
8 B 2021-10-21 6 10.0
9 B 2021-10-21 4 10.0
10 B 2021-10-22 2 10.0
11 B 2021-10-22 7 10.0
12 C 2021-10-20 6 NaN
13 C 2021-10-20 3 NaN
14 C 2021-10-21 8 9.0
15 C 2021-10-21 8 9.0
16 C 2021-10-22 2 16.0
17 C 2021-10-22 6 16.0