I have a dataframe that looks as such
df = {'CAL_YEAR':[2021,2022,2022,2022,2022,2022,2022,2022,2022,2022,2022,2022,2022,2023,2023]
'CAL_MONTH' :[12,1,2,3,4,5,6,7,8,9,10,11,12,1,2]}
I want to calculate a months elapsed columns which should look like this
df = {'CUM_MONTH':[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14]}
how can I do this? my starting month would be 12/2021 or 12/31/2021 (do not care about dates here I only care about the months elapsed). This is economic scenario data but the format of the source data is not in the way we need it.
CodePudding user response:
IIUC:
multiplier = {'CAL_YEAR': 12, 'CAL_MONTH': 1}
df.assign(
CUM_MONTH=df[multiplier].diff().mul(multiplier).sum(axis=1).cumsum()
)
CAL_YEAR CAL_MONTH CUM_MONTH
0 2021 12 0.0
1 2022 1 1.0
2 2022 2 2.0
3 2022 3 3.0
4 2022 4 4.0
5 2022 5 5.0
6 2022 6 6.0
7 2022 7 7.0
8 2022 8 8.0
9 2022 9 9.0
10 2022 10 10.0
11 2022 11 11.0
12 2022 12 12.0
13 2023 1 13.0
14 2023 2 14.0
CodePudding user response:
I basically did the above method but in numerous steps. Did not use diff() , sum() and cumsum() functions.
start_year = int(data["VALUATION_DATE"][0][-4:])
data = data.astype({"CAL_YEAR": "int","CAL_MONTH": "int"})
data["CAL_YEAR_ELAPSED"] = data["CAL_YEAR"] - (start_year 1)
data["CumMonths"] = data["CAL_MONTH"] 12 * data["CAL_YEAR_ELAPSED"] 1