Home > Back-end >  Efficient way of binning timedeltas
Efficient way of binning timedeltas

Time:08-13

So imagine i have the following timedeltas

s = pd.Series({0: Timedelta('0 days 00:00:01.119000'),
 1: Timedelta('0 days 00:00:00.555000'),
 2: Timedelta('0 days 00:00:00.282000'),
 3: Timedelta('0 days 00:00:00.182000'),
 4: Timedelta('0 days 00:00:00.345000')})

And i am trying to formulate a bin column evaluating if they have more than one second

Here it is my approach.

l = ['00:00:00','00:00:01']
bins_sec = pd.to_timedelta(l).total_seconds()
cat = ['<1sec','>1sec']
pd.to_timedelta(list(df.s)).total_seconds()
# I dont quite understand why i need to convert my series into a list, so i can get the total seconds attribute

Wanted result:

s = pd.Series(['>1sec','<1sec','<1sec','<1sec']})

My question is, is there a more efficient way of doing such a binning process mine is taking quite a while? Also why doesnt the total_seconds work, with the return of df.S.total_seconds()

I get the following error, when i make such attempts

AttributeError: 'Series' object has no attribute 'total_seconds'

CodePudding user response:

You can use s.dt.total_seconds() and np.where

from pandas import Timedelta
import pandas as pd 
import numpy as np

s = pd.Series({0: Timedelta('0 days 00:00:01.119000'),
 1: Timedelta('0 days 00:00:00.555000'),
 2: Timedelta('0 days 00:00:00.282000'),
 3: Timedelta('0 days 00:00:00.182000'),
 4: Timedelta('0 days 00:00:00.345000')})

np.where(s.dt.total_seconds().gt(1),'>1sec','<=1sec')

Output

array(['>1sec', '<=1sec', '<=1sec', '<=1sec', '<=1sec'], dtype='<U6')

CodePudding user response:

For a generic method, use pandas.cut:

bins = [0, 0.5, 1] # seconds

pd.cut(s, bins=[pd.Timedelta(f'{i}s') for i in bins] [pd.Timedelta.max],
       labels=[f'>{i}s' for i in bins])

Output:

0      >1s
1    >0.5s
2      >0s
3      >0s
4      >0s
dtype: category
Categories (3, object): ['>0s' < '>0.5s' < '>1s']

Your particular case:

pd.cut(s, bins=[pd.Timedelta('0'), pd.Timedelta('1s'), pd.Timedelta.max],
       labels=['<1s', '>1s'])

output:

0    >1s
1    <1s
2    <1s
3    <1s
4    <1s
dtype: category
Categories (2, object): ['<1s' < '>1s']
  • Related