I'm using python 3.x I have pandas data frame df. df looks like
Name Maths Science Social studies
abc 80 70 90
cde 90 60 80
xyz 100 80 85
...
...
I would like to generate a pandas data frame which will store student name, maximum marks & the subject contributed maximum marks. If maximum marks is 100 then it will consider next highest instead of 100. So my output data frame will look like
Name Highest_Marks Subject_contributed_Max
abc 90 Social Studies
cde 90 Maths
xyz 85 Social Studies
Can you suggest me how to do it?
CodePudding user response:
You can use:
df2 = df.drop(columns='Name').mask(df.eq(100))
df['Highest_Marks'] = df2.max(axis=1)
df['Subject_contributed_Max'] = df2.idxmax(axis=1)
output:
Name Maths Science Social studies Highest_Marks Subject_contributed_Max
0 abc 80 70 90 90.0 Social studies
1 cde 90 60 80 90.0 Maths
2 xyz 100 80 85 85.0 Social studies
For efficiency, avoiding computing twice the max/idxmax, you can compute the idxmax
and use a lookup
s = (df
.drop(columns='Name')
.mask(df.eq(100))
.idxmax(axis=1)
)
idx, cols = pd.factorize(s)
df['Highest_Marks'] = df.reindex(cols, axis=1).to_numpy()[np.arange(len(df)), idx]
df['Subject_contributed_Max'] = s
CodePudding user response:
This will work
df_melt = df.melt('Name')
df_melt = df_melt.loc[df_melt['value'] < 100]
df_melt['RN'] = df_melt.sort_values(['value'], ascending=False).groupby(['Name']).cumcount() 1
df_melt.loc[df_melt['RN'] == 1].sort_values('Name')
CodePudding user response:
This will do it:
df = df.set_index('Name').stack().reset_index().rename(columns={
'level_1':'Subject_contributed_Max', 0:'Highest_Marks'}).sort_values(
['Name','Highest_Marks'])
df = df[df['Highest_Marks'] != 100].groupby('Name').last().reset_index()[[
'Name', 'Highest_Marks', 'Subject_contributed_Max']]
Input:
Name Maths Science Social studies
0 abc 80 70 90
1 cde 90 60 80
2 xyz 100 80 85
Output:
Name Highest_Marks Subject_contributed_Max
0 abc 90 Social studies
1 cde 90 Maths
2 xyz 85 Social studies