new to pandas, I have a challenge regard checking values and performing multiple actions based on four variables (reception_date,final_date,Status,ID) the problem has the following table:
id user_email reception_date end_date status
0 42872 [email protected] 3/30/2022 3/30/2022 Accepted
1 42872 [email protected] 3/1/2022 3/4/2022 Returned
2 42872 [email protected] 3/7/2022 3/30/2022 In Study
3 99999 [email protected] 3/6/2022 3/28/2022 In Study
4 42872 [email protected] 3/23/2022 3/25/2022 In Study
5 99999 [email protected] 3/28/2022 4/5/2022 Accepted
6 78787 [email protected] 3/15/2022 3/16/2022 In Study
Firstly, it is required to perform operations for the same ID, (in this example only few are found, however, the database consists of more than 50,000 data), check if in the Status column contains "Accepted", once verified this, check if the "end_date" of the status "In Study" is equal to the "reception_date" of the status "Accepted", if this condition is true, change the status from "In Study" to "Accepted", the expected output would be as follows:
id user_email reception_date end_date status
0 42872 [email protected] 3/30/2022 3/30/2022 Accepted
1 42872 [email protected] 3/1/2022 3/4/2022 Returned
2 42872 [email protected] 3/7/2022 3/30/2022 Accepted
3 99999 [email protected] 3/6/2022 3/28/2022 Accepted
4 42872 [email protected] 3/23/2022 3/25/2022 In Study
5 99999 [email protected] 3/28/2022 4/5/2022 Accepted
6 78787 [email protected] 3/15/2022 3/16/2022 In Study
Since I'm relative new to pandas I've tried several methods, one of them being my last attemp,using
Test=Test.merge(Test.loc[Test.status== 'Accepted'], how='left', left_on=['id'], right_on=['id'], suffixes=("", "_y"))\
.assign(status=lambda x:np.where((x.end_date_y==x.reception_date) & (x.status== 'In Study'), 'Accepted',x.status))
However the result of this wasn't the expected output, I hope you can help me with this, it's driving me crazy.
CodePudding user response:
You can use:
# which rows are Accepted?
m1 = df['status'].eq('Accepted')
# which rows are In Study?
m2 = df['status'].eq('In Study')
# get In Study indices that also have an Accepted
# on the same date
to_change = (df[m2]
.reset_index()
.merge(df[m1],
left_on=['id', 'end_date'],
right_on=['id', 'reception_date'])
['index']
)
# [2, 3]
# update in place
df.loc[to_change, 'status'] = 'Accepted'
output:
id user_email reception_date end_date status
0 42872 [email protected] 3/30/2022 3/30/2022 Accepted
1 42872 [email protected] 3/1/2022 3/4/2022 Returned
2 42872 [email protected] 3/7/2022 3/30/2022 Accepted
3 99999 [email protected] 3/6/2022 3/28/2022 Accepted
4 42872 [email protected] 3/23/2022 3/25/2022 In Study
5 99999 [email protected] 3/28/2022 4/5/2022 Accepted
6 78787 [email protected] 3/15/2022 3/16/2022 In Study