I have a very long pandas dataframe (10000s of rows). In one column the time is stored in the following format: 'HHMMSS'.
For my further calculation I need to add beside this column a column with seconds.
Here is my problem, described with an example pandas dataframe with 3 rows and 1 column.
import pandas as pd
data = [['123455'], ['000010'], ['100000']]
df = pd.DataFrame(data, columns=['HHMMSS'])
print(df)
# HHMMSS
#0 123455
#1 000010
#2 100000
def get_seconds(time_str):
hh, mm, ss = time_str[0:2], time_str[2:4], time_str[4:6]
return int(hh) * 3600 int(mm) * 60 int(ss)
sec=[get_seconds(df['HHMMSS'][0]),get_seconds(df['HHMMSS'][1]),get_seconds(df['HHMMSS'][2])]
df['sec']=sec
print(df)
# HHMMSS sec
#0 123455 45295
#1 000010 10
#2 100000 36000
How would a (performant) solution look like for very long dataframes?
CodePudding user response:
pandas.Series.dt.total_seconds() only works on TimedeltaArray, TimedeltaIndex or on Series containing timedelta values under the .dt
namespace.
Since your value doens't have any separator among values, you need to use format
argument to let pandas know the format. Then convert it to timedelta values with pd.to_timedelta()
. At last use the dt.total_seconds()
property to get total seconds of timedelta.
df['datetime'] = pd.to_datetime(df['HHMMSS'], format='%H%M%S')
df['delta'] = pd.to_timedelta(df['datetime'].dt.strftime('%H:%M:%S'))
df['delta'].dt.total_seconds()
CodePudding user response:
print(df['col'].dt.total_seconds())
should work.
You may need to set the column to a date time value first, to do this:
df['col'] = pd.to_datetime(df['col'])
df['col'].dt.total_seconds()