I have a column of hours and minutes and I would like all values in the column to be in hours. So how do I divide only the columns values in minutes by 60 to get hours? I tried splitting the column by space to separate numbers and strings but I got stuck how to achieve the desire outcome.
CodePudding user response:
Using a lambda with split.
df["content_duration"] = df["content_duration"].apply(
lambda x: round(int(x.split(" ")[0]) / 60, 2) if x.split(" ")[1] == "mins" else x.split(" ")[0]
)
print(df)
content_duration
0 1.5
1 1
2 1.5
3 1
4 0.62
5 0.73
CodePudding user response:
Use the replace() function to replace the units with their respective conversions. Then apply the pandas eval function to each value to do the necessary conversions. Then round to the desired number of decimal places.
# Create the dataframe
df = pd.DataFrame({"content_duration": ['1.5 hours','1 hour','1.5 hours','1 hour', '37 mins','44 mins']})
# Convert the units to numeric datatype
df['content_duration'] = (df['content_duration'].replace({' mins?':'/60',' hours?':'*1'}, regex=True))\
.apply(pd.eval)\
.round(1)
# Print the dataframe
print(df)
OUTPUT:
content_duration
0 1.5
1 1.0
2 1.5
3 1.0
4 0.6
5 0.7
CodePudding user response:
Pandas's to_timedelta
is very good at converting this, you just need to remove the s
from hours
/mins
:
df['hours'] = (pd.to_timedelta(df['content_duration']
.str.replace(r's\b', '', regex=True))
.dt.total_seconds().div(3600)
.round(2) # optional
)
Output:
content_duration hours
0 1.5 hours 1.50
1 1 hour 1.00
2 1.5 hours 1.50
3 1 hour 1.00
4 37 mins 0.62
5 44 mins 0.73
To have strings:
df['hours'] = (pd.to_timedelta(df['content_duration'].str.replace(r's\b', '', regex=True))
.dt.total_seconds().div(3600).round(2)
.astype(str).add(' hours')
)
output:
content_duration hours
0 1.5 hours 1.5 hours
1 1 hour 1.0 hours
2 1.5 hours 1.5 hours
3 1 hour 1.0 hours
4 37 mins 0.62 hours
5 44 mins 0.73 hours