Home > Back-end >  finding combination of columns which maximizes a quantity
finding combination of columns which maximizes a quantity

Time:02-27

I have a dataframe which looks like this:

  cuid  day  transactions
0    a  mon             1
1    a  tue             2
2    b  tue             2
3    b  wed             1

For each 'cuid' (customer ID) I want to find the day that maximizes the number of transactions. For e.g. for the above df, the output should be a df which looks like

 cuid  day  transactions
1    a  tue             2
2    b  tue             2

I've tried code which looks like:

dd   = {'cuid':['a','a','b','b'],'day':['mon','tue','tue','wed'],'transactions':[1,2,2,1]}
df   = pd.DataFrame(dd)
dfg  = df.groupby(by=['cuid']).agg(transactions=('transactions',max)).reset_index()

But I'm not able to figure out how to join dfg and df.

CodePudding user response:

Approach1

idxmax gives you the index at which a certain column value (which is transaction here) is the maximum.

First we set the index,

step1 = df.set_index(['day', 'cuid'])

Then we find the idxmax,

indices = step1.groupby('cuid')['transactions'].idxmax()

and we get the result

step1.loc[indices].reset_index()

Approach2

df.groupby('cuid')\
    .apply(lambda df: df[df['transactions']==df['transactions'].max()])\
    .reset_index(drop=True)
  • Related