Home > Software engineering >  How to check timestamps and day period, then drop mismatch
How to check timestamps and day period, then drop mismatch

Time:11-13

I have data with timestamps. Users respond to questions and they also select day period (morning or evening). I want to drop rows where recorded timestamp and day period mismatch. So check, if timestamp is between 6am-12pm and discard if "daytime" is "evening", etc.

df
timestamps          daytime
2020-04-10 11:40    Morning
2022-04-12 19:32    Morning *(discard)*
2022-04-12 20:53    Evening
2022-04-15 22:50    Morning *(discard)*
2022-04-16 09:31    Evening*(discard)*

The rule should be: if between 06:00-12:00 and 'daytime' is Evening ==> Remove row/ if between 18:00 - 00:00 and 'daytime' is Morning ==> Remove row

I've tried:

remove = df[ (6< df['timestamp'].dt.hour < 12  & df['period'] == 'Evening') 
| (18< df['timestamp'].dt.hour < 23  & df['period'] == 'Morning')]
df.drop(remove , inplace=True)

CodePudding user response:

6< df['timestamp'].dt.hour < 12

such a triple operation isn't possible on Python yet...

I will create a function like

def get_part_of_day(h):
    return (
        "morning"
        if 6 <= h <= 12
        else "afternoon"
        if 18 <= h <= 23
        else "night"
    )

and

df['datetime'].dt.hour.apply(get_pat_of_day) 

will give you a column (pandas Serie) which you can easily compare.

CodePudding user response:

Instead of dropping you can use .query() to filter.

df["timestamps"] = pd.to_datetime(df["timestamps"])
df = df.query(
    "timestamps.dt.hour.between(6, 12, inclusive='both') & daytime.eq('Morning') | "
    "timestamps.dt.hour.between(18, 23, inclusive='both') & daytime.eq('Evening')"
).reset_index(drop=True)

print(df)

           timestamps  daytime
0 2020-04-10 11:40:00  Morning
1 2022-04-12 20:53:00  Evening
  • Related