Home > Back-end >  Convert columns with time format HHMMSS to sec
Convert columns with time format HHMMSS to sec

Time:03-23

I have a very long pandas dataframe (10000s of rows). In one column the time is stored in the following format: 'HHMMSS'.

For my further calculation I need to add beside this column a column with seconds.

Here is my problem, described with an example pandas dataframe with 3 rows and 1 column.

import pandas as pd

data = [['123455'], ['000010'], ['100000']]
df = pd.DataFrame(data, columns=['HHMMSS'])

print(df)

#   HHMMSS
#0  123455
#1  000010
#2  100000

def get_seconds(time_str):
    hh, mm, ss = time_str[0:2], time_str[2:4], time_str[4:6]
    return int(hh) * 3600   int(mm) * 60   int(ss)

sec=[get_seconds(df['HHMMSS'][0]),get_seconds(df['HHMMSS'][1]),get_seconds(df['HHMMSS'][2])]

df['sec']=sec

print(df)

#   HHMMSS    sec
#0  123455  45295
#1  000010     10
#2  100000  36000

How would a (performant) solution look like for very long dataframes?

CodePudding user response:

pandas.Series.dt.total_seconds() only works on TimedeltaArray, TimedeltaIndex or on Series containing timedelta values under the .dt namespace.

Since your value doens't have any separator among values, you need to use format argument to let pandas know the format. Then convert it to timedelta values with pd.to_timedelta(). At last use the dt.total_seconds() property to get total seconds of timedelta.

df['datetime'] = pd.to_datetime(df['HHMMSS'], format='%H%M%S')
df['delta'] = pd.to_timedelta(df['datetime'].dt.strftime('%H:%M:%S'))
df['delta'].dt.total_seconds()

CodePudding user response:

print(df['col'].dt.total_seconds()) should work.

You may need to set the column to a date time value first, to do this:

df['col'] = pd.to_datetime(df['col'])
df['col'].dt.total_seconds()
  • Related