I have data with timestamps. Users respond to questions and they also select day period (morning or evening). I want to drop rows where recorded timestamp and day period mismatch. So check, if timestamp is between 6am-12pm and discard if "daytime" is "evening", etc.
df
timestamps daytime
2020-04-10 11:40 Morning
2022-04-12 19:32 Morning *(discard)*
2022-04-12 20:53 Evening
2022-04-15 22:50 Morning *(discard)*
2022-04-16 09:31 Evening*(discard)*
The rule should be: if between 06:00-12:00 and 'daytime' is Evening ==> Remove row/ if between 18:00 - 00:00 and 'daytime' is Morning ==> Remove row
I've tried:
remove = df[ (6< df['timestamp'].dt.hour < 12 & df['period'] == 'Evening')
| (18< df['timestamp'].dt.hour < 23 & df['period'] == 'Morning')]
df.drop(remove , inplace=True)
CodePudding user response:
6< df['timestamp'].dt.hour < 12
such a triple operation isn't possible on Python yet...
I will create a function like
def get_part_of_day(h):
return (
"morning"
if 6 <= h <= 12
else "afternoon"
if 18 <= h <= 23
else "night"
)
and
df['datetime'].dt.hour.apply(get_pat_of_day)
will give you a column (pandas Serie) which you can easily compare.
CodePudding user response:
Instead of dropping you can use .query() to filter.
df["timestamps"] = pd.to_datetime(df["timestamps"])
df = df.query(
"timestamps.dt.hour.between(6, 12, inclusive='both') & daytime.eq('Morning') | "
"timestamps.dt.hour.between(18, 23, inclusive='both') & daytime.eq('Evening')"
).reset_index(drop=True)
print(df)
timestamps daytime
0 2020-04-10 11:40:00 Morning
1 2022-04-12 20:53:00 Evening