I am importing data from an Excel worksheet where I have a 'Duration' field displayed in [h]:mm (so that the total number of hours is shown). I understand that underneath, this is simply number of days as a float.
I want to work with this as a timedelta
column or similar in a Pandas dataframe but no matter what I do it's dropping any hours over 24 (e.g. the days portion).
Excel data (over 24 hours highlighted):
Pandas import (1d 7h 51m):
BATCH_NO Duration
354 7154 04:36:00
465 7270 06:35:00
466 7271 08:05:00
467 7272 05:54:00
468 7273 09:10:00
472 7277 06:15:00
476 7280 10:23:00
477 7284 06:09:00
499 7313 06:46:00
503 7322 05:27:00
510 7333 14:15:00
515 7335 1900-01-01 07:51:00
516 7338 07:51:00
517 7339 09:00:00
518 7339 05:29:00
519 7339 09:00:00
520 7339 05:29:00
522 7342 12:10:00
525 7343 08:00:00
530 7346 08:25:00
Running a to_datetime
conversion simply drops the day (integer) part of the column:
BATCH_NO Duration
354 7154 04:36:00
465 7270 06:35:00
466 7271 08:05:00
467 7272 05:54:00
468 7273 09:10:00
472 7277 06:15:00
476 7280 10:23:00
477 7284 06:09:00
499 7313 06:46:00
503 7322 05:27:00
510 7333 14:15:00
515 7335 07:51:00
516 7338 07:51:00
517 7339 09:00:00
518 7339 05:29:00
519 7339 09:00:00
520 7339 05:29:00
522 7342 12:10:00
525 7343 08:00:00
530 7346 08:25:00
I have tried importing by fixing the dtype as float
, but only str
or object
work - dtype={'Duration': str}
works.
float
gives the error float() argument must be a string or a number, not 'datetime.time'
and even with str
or object
, Python still thinks the column i a datetime.time
Ideally I do not want to change the Excel source data or export to .csv as in intermediate step.
CodePudding user response:
If I got it correctly, the imported objects are datetime
and time
with the datetime in Julian calendar.
So you must convert with a custom function:
from datetime import datetime, time, timedelta
def convert(t):
if isinstance(t, time):
t = datetime.combine(datetime.min, t)
delta = t-datetime.min
if delta.days != 0:
delta -= timedelta(days=693594)
return delta
df['Duration'].apply(convert)
Output:
0 0 days 04:36:00
1 0 days 06:35:00
2 0 days 08:05:00
3 0 days 05:54:00
4 0 days 09:10:00
5 0 days 06:15:00
6 0 days 10:23:00
7 0 days 06:09:00
8 0 days 06:46:00
9 0 days 05:27:00
10 0 days 14:15:00
11 1 days 07:51:00 # corrected
12 0 days 07:51:00
13 0 days 09:00:00
14 0 days 05:29:00
15 0 days 09:00:00
...