i'm trying to import excel data to a pandas df with datetime format. The data is an export file generated by a porgram to track worktime. My code works fine but i just realised, that i started from thinking that my import file always contains the following format:
Task | Duration |
---|---|
First | 1900-01-01 22:21:20 |
Second | 1900-01-01 12:13:14 |
I didn't realise that the Duration starts showing the Date '1900-01-01' whenever the Duration exceeds 24 Hours. The Duration switches to the Date: '1900-01-02' whenever the 'Duration' exceeds 48 hours and so on...
- When the Duration is 6 Hours the value of the cell shows: '06:00:00'
- When the Duration is 28 Hours the value of the cell shows: '1900-01-01 04:00:00
- When the Duration is 50 Hours the value of the cell shows: '1900-01-02 02:00:00
I now need a piece of code to convert this format to seconds as an integer. I've used 'pd.to_timedelta' for this before i realised my mistake but i've always cleared out the date before calculating the total seconds. I didn't find a way to do it straight from the timedelta documentation.
CodePudding user response:
IIUC use if input values are in format HH:MM:SS
:
df['Seconds'] = pd.to_timedelta(df['Duration']).dt.total_seconds().astype(int)
CodePudding user response:
Add a prefix if the Duration
has no date part then compute the difference between datetime and origin (1899-12-31).
df['Duration'] = pd.to_datetime(
df['Duration'].where(df['Duration'].str.len() != 8,
other='1899-12-31 ' df['Duration'])
)
df['Seconds'] = (df['Duration'] - pd.to_datetime('1899-12-31')) \
.dt.total_seconds().astype(int)
Output:
Task | Duration | Seconds |
---|---|---|
Task_a | 1900-01-01 04:00:00 | 100800 |
Task_b | 1900-01-02 02:00:00 | 180000 |
Task_c | 1899-12-31 06:00:00 | 21600 |