I got two dataframes like this
df1
date value
0 2019-Jan 200
1 2019-Feb 200
2 2019-Mrz 300
3 2019-Apr 300
4 2019-Mai 400
5 2019-Jun 400
6 2019-Jul 500
7 2019-Aug 500
8 2019-Sep 600
9 2019-Okt 600
10 2019-Nov 700
11 2019-Dez 700
df2
date value
0 2020-Jan 100
1 2020-Feb 200
2 2020-Mrz 300
3 2020-Apr 100
4 2020-Mai 200
5 2020-Jun 300
6 2020-Jul 100
7 2020-Aug 200
8 2020-Sep 300
9 2020-Okt 100
10 2020-Nov 200
11 2020-Dez 300
12 2021-Jan 100
13 2021-Feb 200
14 2021-Mrz 300
15 2021-Apr 100
16 2021-Mai 200
17 2021-Jun 300
18 2021-Jul 100
19 2021-Aug 200
20 2021-Sep 300
What I would like to get is a new dataframe with the difference of each month in df2 to the corresponding month in df1. It should look like this
output
date value
0 2020-Jan -100
1 2020-Feb 0
2 2020-Mrz 0
3 2020-Apr -200
4 2020-Mai -200
5 2020-Jun -100
6 2020-Jul -400
7 2020-Aug -300
8 2020-Sep -300
9 2020-Okt -500
10 2020-Nov -500
11 2020-Dez -400
12 2021-Jan -100
13 2021-Feb 0
14 2021-Mrz 0
15 2021-Apr -200
16 2021-Mai -200
17 2021-Jun -100
18 2021-Jul -400
19 2021-Aug -300
20 2021-Sep -300
As df2 gets bigger over time (when the October, November, December values come in), I would like to have a solution that keeps working no matter what the number of rows in df2 is. I was wondering if there is a way to subtract df1 from the first twelve rows of df2 and then start again with rows 13 and following.
(Obviously, in the real data the numbers are not distributed evenly as in this example)
CodePudding user response:
new_col = df2['value'].sub(df2['date'].str.split('-').str[-1].map(dict(zip(df1['date'].str.split('-').str[-1], df1['value']))))
Output:
>>> new_col
0 -100
1 0
2 0
3 -200
4 -200
5 -100
6 -400
7 -300
8 -300
9 -500
10 -500
11 -400
12 -100
13 0
14 0
15 -200
16 -200
17 -100
18 -400
19 -300
20 -300
dtype: int64