I have a dataframe which looks like this:
cuid day transactions
0 a mon 1
1 a tue 2
2 b tue 2
3 b wed 1
For each 'cuid' (customer ID) I want to find the day that maximizes the number of transactions. For e.g. for the above df, the output should be a df which looks like
cuid day transactions
1 a tue 2
2 b tue 2
I've tried code which looks like:
dd = {'cuid':['a','a','b','b'],'day':['mon','tue','tue','wed'],'transactions':[1,2,2,1]}
df = pd.DataFrame(dd)
dfg = df.groupby(by=['cuid']).agg(transactions=('transactions',max)).reset_index()
But I'm not able to figure out how to join dfg
and df
.
CodePudding user response:
Approach1
idxmax
gives you the index at which a certain column value (which is transaction
here) is the maximum.
First we set the index,
step1 = df.set_index(['day', 'cuid'])
Then we find the idxmax,
indices = step1.groupby('cuid')['transactions'].idxmax()
and we get the result
step1.loc[indices].reset_index()
Approach2
df.groupby('cuid')\
.apply(lambda df: df[df['transactions']==df['transactions'].max()])\
.reset_index(drop=True)