I have a pandas dataframe like this:
Date | Route | Col1 | Col2 |
---|---|---|---|
01/01/2020 | A | 99 | 26 |
01/02/2020 | A | 37 | 96 |
01/03/2020 | A | 20 | 83 |
01/01/2021 | A | 50 | 79 |
01/02/2021 | A | 16 | 50 |
01/03/2021 | A | 52 | 33 |
01/01/2020 | B | 48 | 44 |
01/02/2020 | B | 96 | 30 |
01/03/2020 | B | 17 | 42 |
01/01/2021 | B | 34 | 74 |
01/02/2021 | B | 49 | 37 |
01/03/2021 | B | 70 | 12 |
I need to get the ratio for Col1 and Col2 based on 2020 and 2021 values. That is, a new column for each month that has Col2/Col1.
I have tried a few approaches (pct_change, diff), but I am unable to get the right result. I also made a DateTimeIndex to help me with this, but wasn't able to get the ratio.
This is the gist of what I am trying to do, but not sure how to frame the .loc command in this scenario:
df['Col1_ratio'] = df['Col2'] / df.loc(<get index of previous year's row>)['Col1']
This is the expected output dataframe:
Date | Route | Col1 | Col2 | Col1_ratio | Col2_ratio |
---|---|---|---|---|---|
01/01/2020 | A | 99 | 26 | NA | NA |
01/02/2020 | A | 37 | 96 | NA | NA |
01/03/2020 | A | 20 | 83 | NA | NA |
01/01/2021 | A | 50 | 79 | 0.505050505 | 3.038461538 |
01/02/2021 | A | 16 | 50 | 0.432432432 | 0.520833333 |
01/03/2021 | A | 52 | 33 | 2.6 | 0.397590361 |
01/01/2020 | B | 48 | 44 | NA | NA |
01/02/2020 | B | 96 | 30 | NA | NA |
01/03/2020 | B | 17 | 42 | NA | NA |
01/01/2021 | B | 34 | 74 | 0.708333333 | 1.681818182 |
01/02/2021 | B | 49 | 37 | 0.510416667 | 1.233333333 |
01/03/2021 | B | 70 | 12 | 4.117647059 | 0.285714286 |
E.g. Col1_ratio is Col1_JAN21 / Col1_JAN20, and so on.
CodePudding user response:
Idea is create MultiIndex
and then add one year for datetimes in rename
, divide columns and add suffix by DataFrame.add_suffix
, last add to original by DataFrame.join
:
df1 = df.set_index(['Date','Route'])
df2 = df1.rename(lambda x: x pd.offsets.DateOffset(years=1), level=0)
df3 = df1[['Col1','Col2']].div(df2[['Col1','Col2']]).add_suffix('_ratio')
df = df.join(df3, on=['Date','Route'])
print (df)
Date Route Col1 Col2 Col1_ratio Col2_ratio
0 2020-01-01 A 99 26 NaN NaN
1 2020-01-02 A 37 96 NaN NaN
2 2020-01-03 A 20 83 NaN NaN
3 2021-01-01 A 50 79 0.505051 3.038462
4 2021-01-02 A 16 50 0.432432 0.520833
5 2021-01-03 A 52 33 2.600000 0.397590
6 2020-01-01 B 48 44 NaN NaN
7 2020-01-02 B 96 30 NaN NaN
8 2020-01-03 B 17 42 NaN NaN
9 2021-01-01 B 34 74 0.708333 1.681818
10 2021-01-02 B 49 37 0.510417 1.233333
11 2021-01-03 B 70 12 4.117647 0.285714