I have been working on a dataframe where one of the column (flight_time
) contains flight duration, all of the strings are in 3 different formats for example:
- "
07 h 05 m
" - "
13h 55m
" - "
2h 23m
"
I would like to change them all to HH:MM
format and finally change the data type from object
to time
.
Can somebody tell me how to do this?
CodePudding user response:
It's not possible to have a time
dtype. You can have a datetime64
(pd.DatetimeIndex
) or a timedelta64
(pd.TimedeltaIndex
). In your case, I think it's better to have a TimedeltaIndex
so you can use the pd.to_timedelta
function:
df['flight_time2'] = pd.to_timedelta(df['flight_time'])
print(df)
# Output
flight_time flight_time2
0 07 h 05 m 0 days 07:05:00
1 13h 55m 0 days 13:55:00
2 2h 23m 0 days 02:23:00
If you want individual time from datetime.time
, use:
df['flight_time2'] = pd.to_datetime(df['flight_time'].str.findall('\d ')
.str.join(':')).dt.time
print(df)
# Output
flight_time flight_time2
0 07 h 05 m 07:05:00
1 13h 55m 13:55:00
2 2h 23m 02:23:00
In this case, flight_time2
has still object
dtype:
>>> df.dtypes
flight_time object
flight_time2 object
dtype: object
But each value is an instance of datetime.time
:
>>> df.loc[0, 'flight_time2']
datetime.time(7, 5)
In the first case, you can use vectorized method while in the second version is not possible. Furthermore, you loose the dt
accessor.