Home > Mobile >  Generate all Months between two dates column
Generate all Months between two dates column

Time:01-20

I am trying to generate all months between two dates columns. I have the following dataframe:

StartDate EndDate
01-15-2010 08-15-2010
07-01-2009 01-13-2010

My desired out put would be adding a month column that includes all months available between StartDate and EndDate columns.

Output:

StartDate EndDate Month
01-15-2010 08-15-2010 1,2,3,4,5,6,7,8
07-01-2009 01-13-2010 7,8,9,10,11,12,1

I tried writing this code but it's not working as I wanted.

date_range= lambda x:range (x['StartDate'].month,x['EndDate'].month 1)
df=df.assign(month=df.apply(date_range, axis=1)

CodePudding user response:

We can do this using dateutil.rrule

Code:

from dateutil.rrule import rrule, MONTHLY
def month_between_dates(start_date, end_date):
    months_between = [str(dt.month) for dt in rrule(MONTHLY, 
                                dtstart = start_date.replace(day=1), 
                                until = end_date.replace(day=1))]
    return ",".join(months_between)
in_df["Month"] = in_df.apply(lambda x: month_between_dates(x["StartDate"],
                                                           x["EndDate"]), axis=1)

Output:

StartDate    EndDate            Month
2010-01-15 2010-08-15  1,2,3,4,5,6,7,8
2009-07-01 2010-01-15 7,8,9,10,11,12,1

Code explanation:

months_between = [str(dt.month) for dt in rrule(MONTHLY, 
                            dtstart = start_date.replace(day=1), 
                            until = end_date.replace(day=1))]

start_date.replace(day=1) converting date to first day.

for dt in rrule(MONTHLY, dtstart = start_date,until = end_date) iterating over the months between starting and ending dates.

CodePudding user response:

Create months periods with datetimes by Series.dt.to_period and then create period_range per zipped Series with extract months:

s = pd.to_datetime(df['StartDate']).dt.to_period('m')
e = pd.to_datetime(df['EndDate']).dt.to_period('m')

df=df.assign(month=[pd.period_range(y, x).month.tolist() for x, y in zip(e, s)])

print (df)
    StartDate     EndDate                     month
0  01-15-2010  08-15-2010  [1, 2, 3, 4, 5, 6, 7, 8]
1  07-01-2009  01-13-2010  [7, 8, 9, 10, 11, 12, 1]

If need strings joined by , use:

df=df.assign(month=[','.join(pd.period_range(y, x).month.astype(str)) 
                    for x, y in zip(e, s)])

print (df)
    StartDate     EndDate             month
0  01-15-2010  08-15-2010   1,2,3,4,5,6,7,8
1  07-01-2009  01-13-2010  7,8,9,10,11,12,1
  • Related