So imagine i have the following timedeltas
s = pd.Series({0: Timedelta('0 days 00:00:01.119000'),
1: Timedelta('0 days 00:00:00.555000'),
2: Timedelta('0 days 00:00:00.282000'),
3: Timedelta('0 days 00:00:00.182000'),
4: Timedelta('0 days 00:00:00.345000')})
And i am trying to formulate a bin column evaluating if they have more than one second
Here it is my approach.
l = ['00:00:00','00:00:01']
bins_sec = pd.to_timedelta(l).total_seconds()
cat = ['<1sec','>1sec']
pd.to_timedelta(list(df.s)).total_seconds()
# I dont quite understand why i need to convert my series into a list, so i can get the total seconds attribute
Wanted result:
s = pd.Series(['>1sec','<1sec','<1sec','<1sec']})
My question is, is there a more efficient way of doing such a binning process mine is taking quite a while? Also why doesnt the total_seconds work, with the return of df.S.total_seconds()
I get the following error, when i make such attempts
AttributeError: 'Series' object has no attribute 'total_seconds'
CodePudding user response:
You can use s.dt.total_seconds()
and np.where
from pandas import Timedelta
import pandas as pd
import numpy as np
s = pd.Series({0: Timedelta('0 days 00:00:01.119000'),
1: Timedelta('0 days 00:00:00.555000'),
2: Timedelta('0 days 00:00:00.282000'),
3: Timedelta('0 days 00:00:00.182000'),
4: Timedelta('0 days 00:00:00.345000')})
np.where(s.dt.total_seconds().gt(1),'>1sec','<=1sec')
Output
array(['>1sec', '<=1sec', '<=1sec', '<=1sec', '<=1sec'], dtype='<U6')
CodePudding user response:
For a generic method, use pandas.cut
:
bins = [0, 0.5, 1] # seconds
pd.cut(s, bins=[pd.Timedelta(f'{i}s') for i in bins] [pd.Timedelta.max],
labels=[f'>{i}s' for i in bins])
Output:
0 >1s
1 >0.5s
2 >0s
3 >0s
4 >0s
dtype: category
Categories (3, object): ['>0s' < '>0.5s' < '>1s']
Your particular case:
pd.cut(s, bins=[pd.Timedelta('0'), pd.Timedelta('1s'), pd.Timedelta.max],
labels=['<1s', '>1s'])
output:
0 >1s
1 <1s
2 <1s
3 <1s
4 <1s
dtype: category
Categories (2, object): ['<1s' < '>1s']