so I have a data frame that looks like this:
I calculated the duration by using the following code:
df['dropoff_time'] = pd.to_datetime(df['tpep_dropoff_datetime'])
df['pickup_time'] = pd.to_datetime(df['tpep_pickup_datetime'])
df['duration'] = df['dropoff_time'] - df['pickup_time']
and I am trying to convert the duration of a taxi ride from timedelta64 to float by using the following code:
df['duration'] = df[:5]['duration'] / np.timedelta64(1, 's')
However,it seems like the second time I run the code above to convert from timedelta64 to float, I keep getting this message:
Below is a picture showing the datatypes of each column:
So I am getting the float type for the duration column which is what I want, however, some of them are returning a NaN value as shown in the picture, I don't really understand why I am getting this and how to solve this... Can someone please help?
CodePudding user response:
Problem is you filter first 5 values only by [:5]
, so subtracted only 3 values and pandas add NaN
s for all another rows:
df['duration'] = df[:5]['duration'] / np.timedelta64(1, 's')
^^^^^^^
here
So solution is remove [:5]
:
df['duration'] = (df['dropoff_time'] - df['pickup_time'])/ pd.Timedelta("1s")
Or:
df['duration'] = (df['dropoff_time'] - df['pickup_time']).td.total_seconds()