mydata = [{'ID' : '10', 'StartDate': '10/10/2016', 'EndDate': '15/10/2016'},
{'ID' : '20', 'StartDate': '10/10/2016', 'EndDate': '18/10/2016'}]
df = pd.DataFrame(mydata)
df['StartDate'] = pd.to_datetime(df['StartDate']).dt.date
df['EndDate'] = pd.to_datetime(df['EndDate']).dt.date
df = df.loc[df.index.repeat((df['EndDate'] - df['StartDate']).dt.days 1)]
df['Date'] = df['StartDate'] pd.to_timedelta(df.groupby(level=0).cumcount(), unit='d')
I am getting Performance warning :
PerformanceWarning: Adding/subtracting object-dtype array to TimedeltaArray not vectorized.
What can I do to make it vectorized?
CodePudding user response:
For vectorized solution need datetimes instead dates
:
df['StartDate'] = pd.to_datetime(df['StartDate'], dayfirst=True)
df['EndDate'] = pd.to_datetime(df['EndDate'], dayfirst=True)
If there are times and need remove them use Series.dt.normalize
instead dt.date
:
df['StartDate'] = pd.to_datetime(df['StartDate'], dayfirst=True).dt.normalize()
df['EndDate'] = pd.to_datetime(df['EndDate'], dayfirst=True).dt.normalize()
df = df.loc[df.index.repeat((df['EndDate'] - df['StartDate']).dt.days 1)]
df['Date'] = df['StartDate'] pd.to_timedelta(df.groupby(level=0).cumcount(), unit='d')
df = df.reset_index(drop=True)
print (df)
ID StartDate EndDate Date
0 10 2016-10-10 2016-10-15 2016-10-10
1 10 2016-10-10 2016-10-15 2016-10-11
2 10 2016-10-10 2016-10-15 2016-10-12
3 10 2016-10-10 2016-10-15 2016-10-13
4 10 2016-10-10 2016-10-15 2016-10-14
5 10 2016-10-10 2016-10-15 2016-10-15
6 20 2016-10-10 2016-10-18 2016-10-10
7 20 2016-10-10 2016-10-18 2016-10-11
8 20 2016-10-10 2016-10-18 2016-10-12
9 20 2016-10-10 2016-10-18 2016-10-13
10 20 2016-10-10 2016-10-18 2016-10-14
11 20 2016-10-10 2016-10-18 2016-10-15
12 20 2016-10-10 2016-10-18 2016-10-16
13 20 2016-10-10 2016-10-18 2016-10-17
14 20 2016-10-10 2016-10-18 2016-10-18
CodePudding user response:
This should fix the issue.
pandas
complained about dayfirst not set for dates like "%d/%m/%Y".
Removed .dt.date
that prevent pandas to set dtype to datetime64
and caused the warning.
mydata = [{'ID' : '10', 'StartDate': '10/10/2016', 'EndDate': '15/10/2016'}, {'ID' : '20', 'StartDate': '10/10/2016', 'EndDate': '18/10/2016'}]
df = pd.DataFrame(mydata)
df['StartDate'] = pd.to_datetime(df['StartDate'], dayfirst=True)
df['EndDate'] = pd.to_datetime(df['EndDate'], dayfirst=True)
df = df.loc[df.index.repeat((df['EndDate'] - df['StartDate']).dt.days 1)]
df['Date'] = df['StartDate'] pd.to_timedelta(df.groupby(level=0).cumcount(), unit='d')
Before
StartDate object
EndDate object
After
StartDate datetime64[ns]
EndDate datetime64[ns]