I have the following dataset of students taking 2 different exams:
df = pd.DataFrame({'student': 'A B C D E'.split(),
'sat_date': [datetime.datetime(2013,4,1),datetime.datetime(2013,5,1),
datetime.datetime(2013,5,2),datetime.datetime(2013,7,15),
datetime.datetime(2013,8,1)],
'act_date': [datetime.datetime(2013,4,12),datetime.datetime(2013,5,2),
datetime.datetime(2013,4,12), datetime.datetime(2013,7,1),
datetime.datetime(2013,8,2)]})
print(df)
student sat_date act_date
0 A 2013-04-01 2013-04-12
1 B 2013-05-01 2013-05-02
2 C 2013-05-02 2013-04-12
3 D 2013-07-15 2013-07-01
4 E 2013-08-01 2013-08-02
I want to select those students whose two exams are 10 days apart from each other in either ways (any exam can be taken before or after).
I am trying Timedelta
, but I'm not sure if it's optimal.
df[(df['sat_date'] >= df['act_date'] pd.Timedelta(days=10)) | (df['sat_date'] <= df['act_date'] - pd.Timedelta(days=10))]
Desired Output:
student sat_date act_date
0 A 2013-04-01 2013-04-12
2 C 2013-05-02 2013-04-12
3 D 2013-07-15 2013-07-01
Is there any better way of getting the desired output? Any suggestions would be appreciated. Thanks!
CodePudding user response:
I would probably look at the absolute value of the difference between the two dates is greater to or equal than 10.
df.loc[abs((df['sat_date']-df['act_date']).dt.days).ge(10)]
CodePudding user response:
Try as follows:
result = df.loc[abs(df.sat_date - df.act_date).dt.days>=10]
print(result)
student sat_date act_date
0 A 2013-04-01 2013-04-12
2 C 2013-05-02 2013-04-12
3 D 2013-07-15 2013-07-01
Or maybe nicer:
df.loc[abs(df.sat_date - df.act_date).ge(pd.Timedelta(days=10))]
CodePudding user response:
You can filter like this:
df[abs((df['sat_date'] - df['act_date']).days) == 10]