Home > Mobile >  Column contributing Maximum in pandas data frame
Column contributing Maximum in pandas data frame

Time:05-31

I'm using python 3.x I have pandas data frame df. df looks like

Name Maths Science Social studies
abc   80     70      90
cde   90     60      80 
xyz   100    80      85
...
...

I would like to generate a pandas data frame which will store student name, maximum marks & the subject contributed maximum marks. If maximum marks is 100 then it will consider next highest instead of 100. So my output data frame will look like

Name Highest_Marks Subject_contributed_Max
abc   90              Social Studies 
cde   90              Maths 
xyz   85              Social Studies

Can you suggest me how to do it?

CodePudding user response:

You can use:

df2 = df.drop(columns='Name').mask(df.eq(100))

df['Highest_Marks'] = df2.max(axis=1)
df['Subject_contributed_Max'] = df2.idxmax(axis=1)

output:

  Name  Maths  Science  Social studies  Highest_Marks Subject_contributed_Max
0  abc     80       70              90           90.0          Social studies
1  cde     90       60              80           90.0                   Maths
2  xyz    100       80              85           85.0          Social studies

For efficiency, avoiding computing twice the max/idxmax, you can compute the idxmax and use a lookup

s = (df
 .drop(columns='Name')
 .mask(df.eq(100))
 .idxmax(axis=1)
)

idx, cols = pd.factorize(s)
df['Highest_Marks'] = df.reindex(cols, axis=1).to_numpy()[np.arange(len(df)), idx]
df['Subject_contributed_Max'] = s

CodePudding user response:

This will work

df_melt = df.melt('Name')
df_melt = df_melt.loc[df_melt['value'] < 100]
df_melt['RN'] = df_melt.sort_values(['value'], ascending=False).groupby(['Name']).cumcount()   1
df_melt.loc[df_melt['RN'] == 1].sort_values('Name')

CodePudding user response:

This will do it:

df = df.set_index('Name').stack().reset_index().rename(columns={
    'level_1':'Subject_contributed_Max', 0:'Highest_Marks'}).sort_values(
    ['Name','Highest_Marks'])
df = df[df['Highest_Marks'] != 100].groupby('Name').last().reset_index()[[
    'Name', 'Highest_Marks', 'Subject_contributed_Max']]

Input:

  Name  Maths  Science  Social studies
0  abc     80       70              90
1  cde     90       60              80
2  xyz    100       80              85

Output:

  Name  Highest_Marks Subject_contributed_Max
0  abc             90          Social studies
1  cde             90                   Maths
2  xyz             85          Social studies
  • Related