Creating a date column using pandas-CodePudding

I have a dataframe like below.

Id	d_of_arr	d_of_sty
1	2021-12-03	2021-12-04
1	2021-12-03	2021-12-05
1	2021-12-03	2021-12-06
2	2021-12-09	2021-12-10
2	2021-12-09	2021-12-11

I want to add a column which shows the arrival date and all the dates of staying like below,

Id	dates
1	2021-12-03
1	2021-12-04
1	2021-12-05
1	2021-12-06
2	2021-12-09
2	2021-12-10
2	2021-12-11

How to do this using python/pandas?

CodePudding user response：

If performance or large DataFrame use Index.repeat by difference by days for duplicate rows, add timedeltas by counter GroupBy.cumcount and to_timedelta and last sorting with remove duplicates:

df['d_of_arr'] = pd.to_datetime(df['d_of_arr'])
df['d_of_sty'] = pd.to_datetime(df['d_of_sty'])

df = df.loc[df.index.repeat(df['d_of_sty'].sub(df['d_of_arr']).dt.days.add(1))]
df['dates'] = df['d_of_arr'].add(pd.to_timedelta(df.groupby(level=0).cumcount(), unit='d'))

df1 = df[['Id','dates']].sort_values(['Id','dates']).drop_duplicates(ignore_index=True)

Or if small DataFrame or performance not important use list comprehension with DataFrame.explode for new rows:

df['dates'] = [pd.date_range(s, e) for s, e in zip(df['d_of_arr'], df['d_of_sty'])]


df1 = (df.explode('dates')[['Id','dates']]
         .sort_values(['Id','dates'])
         .drop_duplicates(ignore_index=True))
print (df1)
   Id      dates
0   1 2021-12-03
1   1 2021-12-04
2   1 2021-12-05
3   1 2021-12-06
4   2 2021-12-09
5   2 2021-12-10
6   2 2021-12-11