Import Excel Data with datetime format to Pandas DF and convert to seconds-CodePudding

i'm trying to import excel data to a pandas df with datetime format. The data is an export file generated by a porgram to track worktime. My code works fine but i just realised, that i started from thinking that my import file always contains the following format:

Task	Duration
First	1900-01-01 22:21:20
Second	1900-01-01 12:13:14

I didn't realise that the Duration starts showing the Date '1900-01-01' whenever the Duration exceeds 24 Hours. The Duration switches to the Date: '1900-01-02' whenever the 'Duration' exceeds 48 hours and so on...

When the Duration is 6 Hours the value of the cell shows: '06:00:00'
When the Duration is 28 Hours the value of the cell shows: '1900-01-01 04:00:00
When the Duration is 50 Hours the value of the cell shows: '1900-01-02 02:00:00

I now need a piece of code to convert this format to seconds as an integer. I've used 'pd.to_timedelta' for this before i realised my mistake but i've always cleared out the date before calculating the total seconds. I didn't find a way to do it straight from the timedelta documentation.

CodePudding user response：

IIUC use if input values are in format HH:MM:SS:

df['Seconds'] = pd.to_timedelta(df['Duration']).dt.total_seconds().astype(int)

CodePudding user response：

Add a prefix if the Duration has no date part then compute the difference between datetime and origin (1899-12-31).

df['Duration'] = pd.to_datetime(
    df['Duration'].where(df['Duration'].str.len() != 8, 
                         other='1899-12-31 '   df['Duration'])
)

df['Seconds'] = (df['Duration'] - pd.to_datetime('1899-12-31')) \
                    .dt.total_seconds().astype(int)

Output:

Task	Duration	Seconds
Task_a	1900-01-01 04:00:00	100800
Task_b	1900-01-02 02:00:00	180000
Task_c	1899-12-31 06:00:00	21600