Home > Enterprise >  All month ends until the end date
All month ends until the end date

Time:02-23

I have a df with two columns:

index   start_date   end_date
0       2000-01-03   2000-01-20
1       2000-01-04   2000-01-31
2       2000-01-05   2000-02-02 
3       2000-01-05   2000-02-17
...
5100    2020-12-29   2021-01-11
5111    2020-12-30   2021-03-15  

I would like to add columns of all month end dates between the start and end date, so that if the end_date is in the middle of a month, I would still take into account the end of this month. So, my df would look like this:

index   start_date   end_date     first_monthend   second_monthend third_monthend fourth_monthend
0       2000-01-03   2000-01-20   2000-01-31         0               0              0
1       2000-01-04   2000-01-31   2000-01-31         0               0              0
2       2000-01-05   2000-02-02   2000-01-31         2000-02-28      0              0
3       2000-01-05   2000-02-17   2000-01-31         2000-02-28      0              0
... ... ... ... ... ...
5100    2020-12-29   2021-02-11   2020-12-31         2021-01-31      2021-02-28     0
5111    2020-12-30   2021-03-15   2020-12-31         2021-01-31      2021-02-28    2021-03-31 
I would be very grateful if you could help me 

CodePudding user response:

If need parse months between start and end datetimes and add last day of each month use custom lambda function with period_range:

df['start_date'] = pd.to_datetime(df['start_date'])
df['end_date'] = pd.to_datetime(df['end_date'])

def f(x):
    r = pd.period_range(x['start_date'], 
                        x['end_date'], freq='m').to_timestamp(how='end').normalize()
    return pd.Series(r)


df = df.join(df.apply(f, axis=1).fillna(0).add_suffix('_monthend'))
print (df)
     start_date   end_date 0_monthend           1_monthend  \
0    2000-01-03 2000-01-20 2000-01-31                    0   
1    2000-01-04 2000-01-31 2000-01-31                    0   
2    2000-01-05 2000-02-02 2000-01-31  2000-02-29 00:00:00   
3    2000-01-05 2000-02-17 2000-01-31  2000-02-29 00:00:00   
5100 2020-12-29 2021-01-11 2020-12-31  2021-01-31 00:00:00   
5111 2020-12-30 2021-03-15 2020-12-31  2021-01-31 00:00:00   

               2_monthend           3_monthend  
0                       0                    0  
1                       0                    0  
2                       0                    0  
3                       0                    0  
5100                    0                    0  
5111  2021-02-28 00:00:00  2021-03-31 00:00:00  

If not replace missing values by 0:

df = df.join(df.apply(f, axis=1).add_suffix('_monthend'))
print (df)
     start_date   end_date 0_monthend 1_monthend 2_monthend 3_monthend
0    2000-01-03 2000-01-20 2000-01-31        NaT        NaT        NaT
1    2000-01-04 2000-01-31 2000-01-31        NaT        NaT        NaT
2    2000-01-05 2000-02-02 2000-01-31 2000-02-29        NaT        NaT
3    2000-01-05 2000-02-17 2000-01-31 2000-02-29        NaT        NaT
5100 2020-12-29 2021-01-11 2020-12-31 2021-01-31        NaT        NaT
5111 2020-12-30 2021-03-15 2020-12-31 2021-01-31 2021-02-28 2021-03-31
  • Related