I just want to do the below simple stuff, I have a time-series, label those times in between a range. I've found a post about using dt.xx to filter the time, but that's more complex when we want to mask time like 18:30
and 22:30
, when I use df.ts.dt.hour.gt(18) & (df.ts.dt.minute.eq(30))
, it won't gives me what I wanted.
So, Is there some function like the below or I can achieve some result with another method?
df['tmp'] = np.where( df.ts.dt.time.between('18:30','22:30') , True, False)
format of time series:
Timestamp('2021-10-01 17:30:00')
More data pieces can be obtained from here.
Currently, I'm using some very long and ugly code like:
df["tmp"] = np.where(df.ev_charging.eq(True), df.delivered_wh, 0)
df['p1'] = np.where(df.ts.dt.hour.eq(18) & df.ts.dt.minute.eq(30), 1,0)
df['p2'] = np.where(df.ts.dt.hour.eq(23) & df.ts.dt.minute.eq(30), 1,0)
df['p3'] = np.where(df.ts.dt.hour.eq(4) & df.ts.dt.minute.eq(30), 1,0)
df['p'] = np.where(df.p1.eq(1) | df.p2.eq(1) | df.p3.eq(1), 1, 0)
df = df.drop(['p1','p2','p3'],axis=1)
CodePudding user response:
Considering the dataframe below :
import pandas as pd
df = pd.DataFrame({'ts': [pd.Timestamp('2021-10-01 17:00:00'), pd.Timestamp('2021-10-01 17:30:00'), pd.Timestamp('2021-10-01 18:00:00'),
pd.Timestamp('2021-10-01 18:30:00'), pd.Timestamp('2021-10-01 19:00:00'), pd.Timestamp('2021-10-01 19:30:00')],
'f1': [1044, 26, 16, 10, 226, 86],
'f2': [0, 12, 10, 12, 32, 34],
'y': [False, False, False, False, False, False]})
print(df)
f1 f2 y
ts
2021-10-01 17:00:00 1044 0 False
2021-10-01 17:30:00 26 12 False
2021-10-01 18:00:00 16 10 False
2021-10-01 18:30:00 10 12 False
2021-10-01 19:00:00 226 32 False
2021-10-01 19:30:00 86 34 False
You can use pandas.Series.between_time
to check if a timestamp is between two given times.
df = df.set_index('ts')
df_btw = df.between_time('18:30','22:30')
dico = {'left_only': False, 'both': True}
(
df.merge(df_btw.reset_index()['ts'], left_index=True, right_on='ts', how='left', indicator=True)
.set_index('ts')
.assign(tmp=lambda x: x['_merge'].map(dico))
.drop(columns='_merge')
)
# Output :
f1 f2 y tmp
ts
2021-10-01 17:00:00 1044 0 False False
2021-10-01 17:30:00 26 12 False False
2021-10-01 18:00:00 16 10 False False
2021-10-01 18:30:00 10 12 False True
2021-10-01 19:00:00 226 32 False True
2021-10-01 19:30:00 86 34 False True
CodePudding user response:
Firstly, You should set the time series column as index
, and then use between_time
, like below:
df.set_index("ts").between_time(start='18:30', end='22:30')