Home > OS >  Convert string hours to minute pd.eval
Convert string hours to minute pd.eval

Time:04-16

I want to convert all rows of my DataFrame that contains hours and minutes into minutes only. I have a dataframe that looks like this:

df=
    time
0    8h30
1    14h07
2    08h30
3    7h50
4    8h0 
5    8h15
6    6h15

I'm using the following method to convert:

df['time'] = pd.eval(
    df['time'].replace(['h'], ['*60 '], regex=True))

Output

SyntaxError: invalid syntax

I think the error comes from the format of the hour, maybe pd.evalcant accept 08h30 or 8h0, how to solve this probleme ?

CodePudding user response:

To avoid having to trim leading zeros, an alternative approach:

df[['h', 'm']] = df['time'].str.split('h', expand=True).astype(int)
df['total_min'] = df['h']*60   df['m']

Result:

    time   h   m      total_min
0   8h30   8  30            510
1  14h07  14   7            847
2  08h30   8  30            510
3   7h50   7  50            470
4    8h0   8   0            480
5   8h15   8  15            495
6   6h15   6  15            375

CodePudding user response:

Pandas can already handle such strings if the units are included in the string. While 14h07 can't be parse (why assume 07 is minutes?), 14h07 can be converted to a Timedelta :

>>> pd.to_timedelta("14h07m")
Timedelta('0 days 14:07:00')

Given this dataframe :

d1 = pd.DataFrame(['8h30m', '14h07m', '08h30m', '8h0m'],
                  columns=['time'])

You can convert the time series into a Timedelta series with pd.to_timedelta :

>>> d1['tm'] = pd.to_timedelta(d1['time'])
>>> d1
     time              tm
0   8h30m 0 days 08:30:00
1  14h07m 0 days 14:07:00
2  08h30m 0 days 08:30:00
3    8h0m 0 days 08:00:00

To handle the missing minutes unit in the original data, just append m:

d1['tm'] = pd.to_timedelta(d1['time']   'm')

Once you have a Timedelta you can calculate hours and minutes.

The components of the values can be retrieved with Timedelta.components

>>> d1.tm.dt.components.hours
0     8
1    14
2     8
3     8
Name: hours, dtype: int64

To get the total minutes, seconds or hours, change the frequency to minutes:

>>> d1.tm.astype('timedelta64[m]')
0    510.0
1    847.0
2    510.0
3    480.0
Name: tm, dtype: float64

Bringing all the operations together :

>>> d1['tm'] = pd.to_timedelta(d1['time'])
>>> d2 = (d1.assign(h=d1.tm.dt.components.hours,
...                 m=d1.tm.dt.components.minutes,
...                 total_minutes=d1.tm.astype('timedelta64[m]')))
>>>
>>> d2
     time              tm   h   m  total_minutes
0   8h30m 0 days 08:30:00   8  30          510.0
1  14h07m 0 days 14:07:00  14   7          847.0
2  08h30m 0 days 08:30:00   8  30          510.0
3    8h0m 0 days 08:00:00   8   0          480.0
  • Related