Existing Dataframe :
Id sms_sent_date sms_number Action_date action_performed
A 21-11-2021 1 21-11-2021 opened
A 22-11-2021 2 22-11-2021 clicked
A 23-11-2021 3 23-11-2021 opened
A 23-11-2021 3 23-11-2021 proceeded
A 23-11-2021 3 23-11-2021 halted
B 11-12-2021 1 11-12-2021 opened
B 12-12-2021 2 12-12-2021 opened
B 12-12-2021 2 12-12-2021 clicked
B 12-12-2021 2 12-12-2021 succeded
Expected Dataframe :
Id sms_sent_date sms_number Action_date action_performed
A 23-11-2021 3 23-11-2021 opened
B 12-12-2021 2 12-12-2021 opened
I am trying to find out , what was the initial Action_date and first action_performed on the last sms_sent_date for each user
I know with this code sms_max = df.groupby(['Id']).agg({'sms_sent_date ': [np.max]})
i can get the last sms sent date , but how to fetch the respective info from other columns
CodePudding user response:
Filter rows by maximal sms_sent_date
per groups by GroupBy.transform
and then get first row by Id
by DataFrame.drop_duplicates
:
df1 = (df[df['sms_sent_date'].eq(df.groupby(['Id'])['sms_sent_date'].transform('max'))]
.drop_duplicates('Id'))
print (df1)
Id sms_sent_date sms_number Action_date action_performed
2 A 23-11-2021 3 23-11-2021 opened
6 B 12-12-2021 2 12-12-2021 opened