I am trying to generate all months between two dates columns. I have the following dataframe:
StartDate | EndDate |
---|---|
01-15-2010 | 08-15-2010 |
07-01-2009 | 01-13-2010 |
My desired out put would be adding a month column that includes all months available between StartDate and EndDate columns.
Output:
StartDate | EndDate | Month |
---|---|---|
01-15-2010 | 08-15-2010 | 1,2,3,4,5,6,7,8 |
07-01-2009 | 01-13-2010 | 7,8,9,10,11,12,1 |
I tried writing this code but it's not working as I wanted.
date_range= lambda x:range (x['StartDate'].month,x['EndDate'].month 1)
df=df.assign(month=df.apply(date_range, axis=1)
CodePudding user response:
We can do this using dateutil.rrule
Code:
from dateutil.rrule import rrule, MONTHLY
def month_between_dates(start_date, end_date):
months_between = [str(dt.month) for dt in rrule(MONTHLY,
dtstart = start_date.replace(day=1),
until = end_date.replace(day=1))]
return ",".join(months_between)
in_df["Month"] = in_df.apply(lambda x: month_between_dates(x["StartDate"],
x["EndDate"]), axis=1)
Output:
StartDate EndDate Month
2010-01-15 2010-08-15 1,2,3,4,5,6,7,8
2009-07-01 2010-01-15 7,8,9,10,11,12,1
Code explanation:
months_between = [str(dt.month) for dt in rrule(MONTHLY,
dtstart = start_date.replace(day=1),
until = end_date.replace(day=1))]
start_date.replace(day=1) converting date to first day.
for dt in rrule(MONTHLY, dtstart = start_date,until = end_date) iterating over the months between starting and ending dates.
CodePudding user response:
Create months periods with datetimes by Series.dt.to_period
and then create period_range
per zipped Series
with extract months:
s = pd.to_datetime(df['StartDate']).dt.to_period('m')
e = pd.to_datetime(df['EndDate']).dt.to_period('m')
df=df.assign(month=[pd.period_range(y, x).month.tolist() for x, y in zip(e, s)])
print (df)
StartDate EndDate month
0 01-15-2010 08-15-2010 [1, 2, 3, 4, 5, 6, 7, 8]
1 07-01-2009 01-13-2010 [7, 8, 9, 10, 11, 12, 1]
If need strings joined by ,
use:
df=df.assign(month=[','.join(pd.period_range(y, x).month.astype(str))
for x, y in zip(e, s)])
print (df)
StartDate EndDate month
0 01-15-2010 08-15-2010 1,2,3,4,5,6,7,8
1 07-01-2009 01-13-2010 7,8,9,10,11,12,1