Home > Enterprise >  Pandas read format %D:%H:%M:%S with python
Pandas read format %D:%H:%M:%S with python

Time:03-09

Currently I am reading in a data frame with the timestamp from film 00(days):00(hours clocks over at 24 to day):00(min):00(sec)

pandas reads time formats HH:MM:SS and YYYY:MM:DD HH:MM:SS fine. Though is there a way of having pandas read the duration of time such as the DD:HH:MM:SS.

Alternatively using timedelta how would I go about getting the DD into HH in the data frame so that pandas can make it "1 day HH:MM:SS" for example

Data sample

00:00:00:00
00:07:33:57 
02:07:02:13 
00:00:13:11 
00:00:10:11 
00:00:00:00 
00:06:20:06 
01:12:13:25 

Expected output for last sample

36:13:25

Thanks

CodePudding user response:

Convert days separately, add to times and last call custom function:

def f(x):
    ts = x.total_seconds()
    hours, remainder = divmod(ts, 3600)
    minutes, seconds = divmod(remainder, 60)
    return ('{}:{:02d}:{:02d}').format(int(hours), int(minutes), int(seconds)) 


d = pd.to_timedelta(df['col'].str[:2].astype(int), unit='d')
td = pd.to_timedelta(df['col'].str[3:])
df['col'] =  d.add(td).apply(f)
print (df)
        col
0   0:00:00
1   7:33:57
2  55:02:13
3   0:13:11
4   0:10:11
5   0:00:00
6   6:20:06
7  36:13:25

CodePudding user response:

If you want timedelta objects, a simple way is to replace the first colon with days :

df['timedelta'] = pd.to_timedelta(df['col'].str.replace(':', 'days ', n=1))

output:

           col       timedelta
0  00:00:00:00 0 days 00:00:00
1  00:07:33:57 0 days 07:33:57
2  02:07:02:13 2 days 07:02:13
3  00:00:13:11 0 days 00:13:11
4  00:00:10:11 0 days 00:10:11
5  00:00:00:00 0 days 00:00:00
6  00:06:20:06 0 days 06:20:06
7  01:12:13:25 1 days 12:13:25
>>> df.dtypes
col                   object
timedelta    timedelta64[ns]
dtype: object

From there it's also relatively easy to combine the days and hours as string:

c = df['timedelta'].dt.components
df['str_format'] = ((c['hours'] c['days']*24).astype(str)
                     df['col'].str.split('(?=:)', n=2).str[-1]).str.zfill(8)

output:

           col       timedelta str_format
0  00:00:00:00 0 days 00:00:00   00:00:00
1  00:07:33:57 0 days 07:33:57   07:33:57
2  02:07:02:13 2 days 07:02:13   55:02:13
3  00:00:13:11 0 days 00:13:11   00:13:11
4  00:00:10:11 0 days 00:10:11   00:10:11
5  00:00:00:00 0 days 00:00:00   00:00:00
6  00:06:20:06 0 days 06:20:06   06:20:06
7  01:12:13:25 1 days 12:13:25   36:13:25
  • Related