Home > database >  Pandas: Subtracting a dataframe from a smaller dataframe: Start again when out of rows
Pandas: Subtracting a dataframe from a smaller dataframe: Start again when out of rows

Time:12-08

I got two dataframes like this

df1

    date    value
0   2019-Jan    200
1   2019-Feb    200
2   2019-Mrz    300
3   2019-Apr    300
4   2019-Mai    400
5   2019-Jun    400
6   2019-Jul    500
7   2019-Aug    500
8   2019-Sep    600
9   2019-Okt    600
10  2019-Nov    700
11  2019-Dez    700

df2

    date    value
0   2020-Jan    100
1   2020-Feb    200
2   2020-Mrz    300
3   2020-Apr    100
4   2020-Mai    200 
5   2020-Jun    300
6   2020-Jul    100
7   2020-Aug    200
8   2020-Sep    300
9   2020-Okt    100
10  2020-Nov    200
11  2020-Dez    300
12  2021-Jan    100
13  2021-Feb    200
14  2021-Mrz    300
15  2021-Apr    100
16  2021-Mai    200
17  2021-Jun    300
18  2021-Jul    100
19  2021-Aug    200
20  2021-Sep    300

What I would like to get is a new dataframe with the difference of each month in df2 to the corresponding month in df1. It should look like this

output

    date    value
0   2020-Jan    -100
1   2020-Feb    0
2   2020-Mrz    0
3   2020-Apr    -200
4   2020-Mai    -200
5   2020-Jun    -100
6   2020-Jul    -400
7   2020-Aug    -300
8   2020-Sep    -300
9   2020-Okt    -500
10  2020-Nov    -500
11  2020-Dez    -400
12  2021-Jan    -100
13  2021-Feb    0
14  2021-Mrz    0
15  2021-Apr    -200
16  2021-Mai    -200
17  2021-Jun    -100
18  2021-Jul    -400
19  2021-Aug    -300
20  2021-Sep    -300

As df2 gets bigger over time (when the October, November, December values come in), I would like to have a solution that keeps working no matter what the number of rows in df2 is. I was wondering if there is a way to subtract df1 from the first twelve rows of df2 and then start again with rows 13 and following.

(Obviously, in the real data the numbers are not distributed evenly as in this example)

CodePudding user response:

new_col = df2['value'].sub(df2['date'].str.split('-').str[-1].map(dict(zip(df1['date'].str.split('-').str[-1], df1['value']))))

Output:

>>> new_col
0    -100
1       0
2       0
3    -200
4    -200
5    -100
6    -400
7    -300
8    -300
9    -500
10   -500
11   -400
12   -100
13      0
14      0
15   -200
16   -200
17   -100
18   -400
19   -300
20   -300
dtype: int64
  • Related