I have a dataframe of 15 columns, 9 of them are strings in duration format (hh:mm:ss) that I need to convert them into seconds (int). I am still new with pandas and python and wrote the below code.
#Example df
df = pd.DataFrame({'column_1':['00:12:21','01:20:01'],'column_2':['00:22:33','02:10:11']})
def duration_to_sec(x):
if len(x.split(':')) == 3 :
h,m,s = x.split(':')
x = (int(h)*60*60) (int(m)*60) (int(s))
else : x=0
return x
new_col = []
for i in df['column_1']:
x = duration_to_sec(i)
new_col.append(x)
df['column_1'] = pd.Series(new_col)
print(df)
Expected output (same as column_1)
column_1 column_2
0 741 00:22:33
1 4801 02:10:11
it is working fine but it is really basic and I have to declare the function multiple times for the 9 columns. so what is the simpler and cleaner methods to apply here?
CodePudding user response:
Use a conversion to_timedelta
, then total_seconds
:
df['column_1'] = pd.to_timedelta(df['column_1']).dt.total_seconds():
Output:
column_1 column_2
0 741.0 00:22:33
1 4801.0 02:10:11
converting multiple columns:
def to_sec(s):
return pd.to_timedelta(s).dt.total_seconds()
out = df.apply(to_sec)
Output:
column_1 column_2
0 741.0 1353.0
1 4801.0 7811.0
CodePudding user response:
You can loop through a list of column names if needed:
for col in ('column_1', 'column_2'): # Put all your columns that need conversion.
df[col] = pd.to_timedelta(df[col])
df[f'{col}_seconds'] = df[col].dt.total_seconds()
Output:
column_1 column_2 column_1_seconds column_2_seconds
0 0 days 00:12:21 0 days 00:22:33 741.0 1353.0
1 0 days 01:20:01 0 days 02:10:11 4801.0 7811.0