Have a dataset with a duration column with time data listed as an object shown below
df['duration'].head(10)
0 60 min.
1 1 hr. 13 min.
2 1 hr. 10 min.
3 52 min.
4 1 hr. 25 min.
5 45 min.
6 45 min.
7 60 min.
8 45 min.
9 45 min.
Name: duration, dtype: object
How do I change this to an appropriate numerical value, like below?
0 00:60
1 01:13
2 01:10
3 00:52
4 01:25
5 00:45
CodePudding user response:
Here is a way to get a string version in %H:%M
format and a timedelta version:
import pandas as pd
df = pd.DataFrame({'duration':['60 min.', '1 hr. 13 min.', '1 hr. 10 min.']})
print(df)
df['parts']=df.duration.str.findall('\d ')
df['timedelta']=df.parts.apply(lambda x: pd.to_timedelta((0 if len(x) < 2 else int(x[0])) * 3600 int(x[-1])*60, unit='s'))
df['hours and minutes']=df.parts.apply(lambda x: f"{0 if len(x) < 2 else int(x[0]):02}:{int(x[-1]):02}")
df = df.drop(columns=['duration', 'parts'])
print(df)
Input:
duration
0 60 min.
1 1 hr. 13 min.
2 1 hr. 10 min.
Output:
timedelta hours and minutes
0 0 days 01:00:00 00:60
1 0 days 01:13:00 01:13
2 0 days 01:10:00 01:10
If we do this:
print(df.timedelta.dtypes)
... we see that the timedelta column indeed contains numerical values (of timedelta data type):
timedelta64[ns]
CodePudding user response:
You could apply a lambda function on your duration
column like this:
import pandas as pd
import datetime as dt
def transform(t):
if 'hr.' in t:
return dt.datetime.strptime(t, '%I hr. %M min.').strftime('%I:%M')
return dt.datetime.strptime(t, '%M min.').strftime('00:%M')
df = pd.DataFrame(['45 min.', '1 hr. 13 min.'], columns=['duration'])
print(df)
df['duration'] = df['duration'].apply(lambda x: transform(x))
print(df)
Outputs:
duration
0 45 min.
1 1 hr. 13 min.
and then
duration
0 00:45
1 01:13
Note that if you want "60 min." mapped into "00:60", then you need some additional logic in the transform
function, since the minutes format %M
only takes values between 00-59.