Home > Enterprise >  How can I get the first three max values from each row in a Pandas dataframe?
How can I get the first three max values from each row in a Pandas dataframe?

Time:06-26

The dataframe is like this;

Cluster Genre 1 Genre 2 Genre 3 Genre 4 Genre 5
1 10 31 5 3 23
2 53 12 6 9 7
3 44 73 1 9 13

As output, I want something like this, so I can see what genres are the dominant ones in each cluster.

Cluster 1st 2nd 3rd
1 Genre 2 Genre 5 Genre 1
2 Genre 1 Genre 2 Genre 4
3 Genre 2 Genre 1 Genre 5

I want to show the top 3 "genres" from each cluster in a graph, I have no idea how I would do this for a row instead of columns. Is anyone here familiar with this?

CodePudding user response:

You can use numpy.argsort on df.values and axis=1 and select three largest and use df.columns for getting column name:

import pandas as pd
import numpy as np
df = df.set_index('Cluster')
res = pd.DataFrame(df.columns[np.argsort(-1*df.values,axis=1)[:, :3]], 
                   columns=['1st', '2nd',' 3rd'])
print(res)

Output:

        1st       2nd       3rd
0   Genre 2   Genre 5   Genre 1
1   Genre 1   Genre 2   Genre 4
2   Genre 2   Genre 1   Genre 5

CodePudding user response:

You can use nlargest with apply for each row as given in the code below to solve this problem,

>>> import pandas as pd
>>> 
>>> dictA = {'genre1': [10,20,56,2,30,44],'genre2': [ 9,70,96,7,3,54],'genre3': [ 12,10,16,23,39,4],'genre4': [ 40,29,26,15,43,14]}
>>>         
... dfA = pd.DataFrame(dictA)
>>> dfA.index.name = 'cluster'  
>>>     
... dfA.reset_index(inplace=True)
>>> 
>>> dfA
   cluster  genre1  genre2  genre3  genre4
0        0      10       9      12      40
1        1      20      70      10      29
2        2      56      96      16      26
3        3       2       7      23      15
4        4      30       3      39      43
5        5      44      54       4      14
>>> df1 = (dfA.set_index('cluster')
...         .apply(lambda x: pd.Series(x.nlargest(3).index), axis=1)
...         .reset_index())
>>> df1
   cluster       0       1       2
0        0  genre4  genre3  genre1
1        1  genre2  genre4  genre1
2        2  genre2  genre1  genre4
3        3  genre3  genre4  genre2
4        4  genre4  genre3  genre1
5        5  genre2  genre1  genre4

Adapted from here

  • Related