I have extracted YouTube data and the length of the video results extracted are in different formats. Here is a sample of the raw data:
length
4:26:00
1:02:23
9:31
1:21
How do I convert my results to only minutes?
The variable stored in a vector data
, I have tried:
pd.to_datetime(data['length'], format='%H:%M:%S')
But I get the error
ValueError: time data '4:26' does not match format '%H:%M:%S' (match)
CodePudding user response:
Use dateutil.parser
from dateutil import parser
times = ["4:26:00", "1:02:23", "9:31", "1:21"]
parsed_times = [parser.parse(t).time() for t in times]
CodePudding user response:
Using pandas:
df['length'] = df['length'].str.strip()
df['length']= pd.to_datetime(df['length'], format='%H:%M:%S', errors='coerce').fillna(pd.to_datetime(df['length'], format='%M:%S', errors='coerce'))
output:
length
0 1900-01-01 04:26:00
1 1900-01-01 01:02:23
2 1900-01-01 00:09:31
3 1900-01-01 00:01:21
CodePudding user response:
instead of using datetime, you can use timedelta since you're working with durations. Ex:
df = pd.DataFrame({'length': ["4:26:00", "1:02:23", "9:31", "1:21"]})
# where the hour is missing we prepend it as zero
m = df['length'].str.len() < 6
df.loc[m, 'length'] = '00:' df['length'][m]
df['length'] = pd.to_timedelta(df['length'])
df['length']
0 0 days 04:26:00
1 0 days 01:02:23
2 0 days 00:09:31
3 0 days 00:01:21
Name: length, dtype: timedelta64[ns]