Home > OS >  Is there a "between" function in Pandas that I can use to mask time series?
Is there a "between" function in Pandas that I can use to mask time series?

Time:09-23

I just want to do the below simple stuff, I have a time-series, label those times in between a range. I've found a post about using dt.xx to filter the time, but that's more complex when we want to mask time like 18:30 and 22:30, when I use df.ts.dt.hour.gt(18) & (df.ts.dt.minute.eq(30)) , it won't gives me what I wanted.

So, Is there some function like the below or I can achieve some result with another method?

df['tmp'] = np.where( df.ts.dt.time.between('18:30','22:30') , True, False)

format of time series:

Timestamp('2021-10-01 17:30:00')

More data pieces can be obtained from here.

Currently, I'm using some very long and ugly code like:

df["tmp"] = np.where(df.ev_charging.eq(True), df.delivered_wh, 0)
df['p1'] = np.where(df.ts.dt.hour.eq(18) & df.ts.dt.minute.eq(30), 1,0)
df['p2'] = np.where(df.ts.dt.hour.eq(23) & df.ts.dt.minute.eq(30), 1,0)
df['p3'] = np.where(df.ts.dt.hour.eq(4) & df.ts.dt.minute.eq(30), 1,0)
df['p'] = np.where(df.p1.eq(1) | df.p2.eq(1) | df.p3.eq(1), 1, 0)
df = df.drop(['p1','p2','p3'],axis=1)

CodePudding user response:

Considering the dataframe below :

import pandas as pd

df = pd.DataFrame({'ts': [pd.Timestamp('2021-10-01 17:00:00'), pd.Timestamp('2021-10-01 17:30:00'), pd.Timestamp('2021-10-01 18:00:00'), 
                          pd.Timestamp('2021-10-01 18:30:00'), pd.Timestamp('2021-10-01 19:00:00'), pd.Timestamp('2021-10-01 19:30:00')],
 'f1': [1044, 26, 16, 10, 226, 86],
 'f2': [0, 12, 10, 12, 32, 34],
 'y':  [False, False, False, False, False, False]})

print(df)

                       f1  f2      y
ts                                  
2021-10-01 17:00:00  1044   0  False
2021-10-01 17:30:00    26  12  False
2021-10-01 18:00:00    16  10  False
2021-10-01 18:30:00    10  12  False
2021-10-01 19:00:00   226  32  False
2021-10-01 19:30:00    86  34  False

You can use pandas.Series.between_time to check if a timestamp is between two given times.

df = df.set_index('ts')

df_btw = df.between_time('18:30','22:30')

dico = {'left_only': False, 'both': True}
(
    df.merge(df_btw.reset_index()['ts'], left_index=True, right_on='ts', how='left', indicator=True)
    .set_index('ts')
    .assign(tmp=lambda x: x['_merge'].map(dico))
    .drop(columns='_merge')
)
# Output :
                       f1  f2      y    tmp
ts                                         
2021-10-01 17:00:00  1044   0  False  False
2021-10-01 17:30:00    26  12  False  False
2021-10-01 18:00:00    16  10  False  False
2021-10-01 18:30:00    10  12  False   True
2021-10-01 19:00:00   226  32  False   True
2021-10-01 19:30:00    86  34  False   True

CodePudding user response:

Firstly, You should set the time series column as index, and then use between_time, like below:

df.set_index("ts").between_time(start='18:30', end='22:30')
  • Related