Suppose I have the following two DFs:
DF A: First column is a date, and then there are columns that start with a year (2021, 2022...)
Date 2021.Water 2021.Gas 2022.Electricity
may-04 500 470 473
may-05 520 490 493
may-06 540 510 513
DF B: First column is a date, and then there are columns that start with a year (2021, 2022...)
Date 2021.Amount 2022.Amount
may-04 100 95
may-05 110 105
may-06 120 115
The expected result is a DF with the columns from DF A, but that have the rows divided by the values for the matching year in DF B. Such as:
Date 2021.Water 2021.Gas 2022.Electricity
may-04 5.0 4.7 5.0
may-05 4.7 4.5 4.7
may-06 4.5 4.3 4.5
I am really struggling with this problem. Let me know if any clarifications are needed and will be glad to help.
CodePudding user response:
Try this:
dfai = dfa.set_index('Date')
dfai.columns = dfai.columns.str.split('.', expand=True)
dfbi = dfb.set_index('Date').rename(columns = lambda x: x.split('.')[0])
df_out = dfai.div(dfbi, level=0).round(1)
df_out.columns = df_out.columns.map('.'.join)
df_out.reset_index()
Output:
Date 2021.Water 2021.Gas 2022.Electricity
0 may-04 5.0 4.7 5.0
1 may-05 4.7 4.5 4.7
2 may-06 4.5 4.2 4.5
Details
First, move 'Date' into the index of both dataframes, then use string split to get years into a level in each dataframe.
Use, pd.DataFrame.div with level=0 to align operations on the top level index of each dataframe.
Flatten multiindex column header back to a single level and reset_index.