Home > Enterprise >  Convert multiple columns with string in (hh:mm:ss) format to seconds (int) in a pandas dataframe
Convert multiple columns with string in (hh:mm:ss) format to seconds (int) in a pandas dataframe

Time:08-22

I have a dataframe of 15 columns, 9 of them are strings in duration format (hh:mm:ss) that I need to convert them into seconds (int). I am still new with pandas and python and wrote the below code.

#Example df
df = pd.DataFrame({'column_1':['00:12:21','01:20:01'],'column_2':['00:22:33','02:10:11']})

def duration_to_sec(x):
    
    if len(x.split(':')) == 3 :
        h,m,s = x.split(':')
        x = (int(h)*60*60) (int(m)*60) (int(s))
    else : x=0
    return x

new_col = []
for i in df['column_1']:
    x = duration_to_sec(i)
    new_col.append(x)

df['column_1'] = pd.Series(new_col)
print(df)

Expected output (same as column_1)

   column_1  column_2
0       741  00:22:33
1      4801  02:10:11

it is working fine but it is really basic and I have to declare the function multiple times for the 9 columns. so what is the simpler and cleaner methods to apply here?

CodePudding user response:

Use a conversion to_timedelta, then total_seconds:

df['column_1'] = pd.to_timedelta(df['column_1']).dt.total_seconds():

Output:

   column_1  column_2
0     741.0  00:22:33
1    4801.0  02:10:11

converting multiple columns:

def to_sec(s):
    return pd.to_timedelta(s).dt.total_seconds()

out = df.apply(to_sec)

Output:

   column_1  column_2
0     741.0    1353.0
1    4801.0    7811.0

CodePudding user response:

You can loop through a list of column names if needed:

for col in ('column_1', 'column_2'): # Put all your columns that need conversion.
    df[col] = pd.to_timedelta(df[col])
    df[f'{col}_seconds'] = df[col].dt.total_seconds()

Output:

         column_1        column_2  column_1_seconds  column_2_seconds
0 0 days 00:12:21 0 days 00:22:33             741.0            1353.0
1 0 days 01:20:01 0 days 02:10:11            4801.0            7811.0
  • Related