I have a df with two columns:
index start_date end_date
0 2000-01-03 2000-01-20
1 2000-01-04 2000-01-31
2 2000-01-05 2000-02-02
3 2000-01-05 2000-02-17
...
5100 2020-12-29 2021-01-11
5111 2020-12-30 2021-03-15
I would like to add columns of all month end dates between the start and end date, so that if the end_date is in the middle of a month, I would still take into account the end of this month. So, my df would look like this:
index start_date end_date first_monthend second_monthend third_monthend fourth_monthend
0 2000-01-03 2000-01-20 2000-01-31 0 0 0
1 2000-01-04 2000-01-31 2000-01-31 0 0 0
2 2000-01-05 2000-02-02 2000-01-31 2000-02-28 0 0
3 2000-01-05 2000-02-17 2000-01-31 2000-02-28 0 0
... ... ... ... ... ...
5100 2020-12-29 2021-02-11 2020-12-31 2021-01-31 2021-02-28 0
5111 2020-12-30 2021-03-15 2020-12-31 2021-01-31 2021-02-28 2021-03-31
I would be very grateful if you could help me
CodePudding user response:
If need parse months between start and end datetimes and add last day of each month use custom lambda function with period_range
:
df['start_date'] = pd.to_datetime(df['start_date'])
df['end_date'] = pd.to_datetime(df['end_date'])
def f(x):
r = pd.period_range(x['start_date'],
x['end_date'], freq='m').to_timestamp(how='end').normalize()
return pd.Series(r)
df = df.join(df.apply(f, axis=1).fillna(0).add_suffix('_monthend'))
print (df)
start_date end_date 0_monthend 1_monthend \
0 2000-01-03 2000-01-20 2000-01-31 0
1 2000-01-04 2000-01-31 2000-01-31 0
2 2000-01-05 2000-02-02 2000-01-31 2000-02-29 00:00:00
3 2000-01-05 2000-02-17 2000-01-31 2000-02-29 00:00:00
5100 2020-12-29 2021-01-11 2020-12-31 2021-01-31 00:00:00
5111 2020-12-30 2021-03-15 2020-12-31 2021-01-31 00:00:00
2_monthend 3_monthend
0 0 0
1 0 0
2 0 0
3 0 0
5100 0 0
5111 2021-02-28 00:00:00 2021-03-31 00:00:00
If not replace missing values by 0
:
df = df.join(df.apply(f, axis=1).add_suffix('_monthend'))
print (df)
start_date end_date 0_monthend 1_monthend 2_monthend 3_monthend
0 2000-01-03 2000-01-20 2000-01-31 NaT NaT NaT
1 2000-01-04 2000-01-31 2000-01-31 NaT NaT NaT
2 2000-01-05 2000-02-02 2000-01-31 2000-02-29 NaT NaT
3 2000-01-05 2000-02-17 2000-01-31 2000-02-29 NaT NaT
5100 2020-12-29 2021-01-11 2020-12-31 2021-01-31 NaT NaT
5111 2020-12-30 2021-03-15 2020-12-31 2021-01-31 2021-02-28 2021-03-31