Home > Software design >  Calculate months elapsed since start value in pandas dataframe
Calculate months elapsed since start value in pandas dataframe

Time:04-07

I have a dataframe that looks as such

df = {'CAL_YEAR':[2021,2022,2022,2022,2022,2022,2022,2022,2022,2022,2022,2022,2022,2023,2023]
'CAL_MONTH' :[12,1,2,3,4,5,6,7,8,9,10,11,12,1,2]}

I want to calculate a months elapsed columns which should look like this

df = {'CUM_MONTH':[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14]}

how can I do this? my starting month would be 12/2021 or 12/31/2021 (do not care about dates here I only care about the months elapsed). This is economic scenario data but the format of the source data is not in the way we need it.

CodePudding user response:

IIUC:

multiplier = {'CAL_YEAR': 12, 'CAL_MONTH': 1}
df.assign(
    CUM_MONTH=df[multiplier].diff().mul(multiplier).sum(axis=1).cumsum()
)

    CAL_YEAR  CAL_MONTH  CUM_MONTH
0       2021         12        0.0
1       2022          1        1.0
2       2022          2        2.0
3       2022          3        3.0
4       2022          4        4.0
5       2022          5        5.0
6       2022          6        6.0
7       2022          7        7.0
8       2022          8        8.0
9       2022          9        9.0
10      2022         10       10.0
11      2022         11       11.0
12      2022         12       12.0
13      2023          1       13.0
14      2023          2       14.0

CodePudding user response:

I basically did the above method but in numerous steps. Did not use diff() , sum() and cumsum() functions.

start_year = int(data["VALUATION_DATE"][0][-4:])
    data = data.astype({"CAL_YEAR": "int","CAL_MONTH": "int"})
    data["CAL_YEAR_ELAPSED"] = data["CAL_YEAR"]  -  (start_year 1)
    data["CumMonths"] = data["CAL_MONTH"]   12 * data["CAL_YEAR_ELAPSED"]  1
  • Related