Home > Blockchain >  Different time formats in dataframe
Different time formats in dataframe

Time:11-15

I have extracted YouTube data and the length of the video results extracted are in different formats. Here is a sample of the raw data:

  length
 4:26:00
 1:02:23
    9:31
    1:21

How do I convert my results to only minutes? The variable stored in a vector data, I have tried:

pd.to_datetime(data['length'], format='%H:%M:%S')

But I get the error

ValueError: time data '4:26' does not match format '%H:%M:%S' (match)

CodePudding user response:

Use dateutil.parser

from dateutil import parser

times = ["4:26:00", "1:02:23", "9:31", "1:21"]

parsed_times = [parser.parse(t).time() for t in times]

CodePudding user response:

Using pandas:

df['length'] = df['length'].str.strip()

df['length']= pd.to_datetime(df['length'], format='%H:%M:%S', errors='coerce').fillna(pd.to_datetime(df['length'], format='%M:%S', errors='coerce'))

output:

               length
0 1900-01-01 04:26:00
1 1900-01-01 01:02:23
2 1900-01-01 00:09:31
3 1900-01-01 00:01:21

CodePudding user response:

instead of using datetime, you can use timedelta since you're working with durations. Ex:

df = pd.DataFrame({'length': ["4:26:00", "1:02:23", "9:31", "1:21"]})

# where the hour is missing we prepend it as zero
m = df['length'].str.len() < 6
df.loc[m, 'length'] = '00:'   df['length'][m]

df['length'] = pd.to_timedelta(df['length'])

df['length']
0   0 days 04:26:00
1   0 days 01:02:23
2   0 days 00:09:31
3   0 days 00:01:21
Name: length, dtype: timedelta64[ns]
  • Related