Home > OS >  Find majority elements in a dataframe (PANDAS)
Find majority elements in a dataframe (PANDAS)

Time:11-18

I need to construct a majority voting (3/5) based on the (int64) elements in the various columns as new column (Voting)

     Column1  Column2 Column3 Column4 Column5
0   0   0   6   1   0
1   4   4   6   4   0
2   4   2   2   2   2
3   4   4   4   4   4
4   0   0   0   2   4
5   6   6   6   6   6
6   3   3   3   3   5
7   0   6   6   0   4
8   3   3   3   3   4
9   2   2   4   2   2

My expecting result is like:

     Column1  Column2 Column3 Column4 Column5 Voting
0   0   0   6   1   0       0
1   4   4   6   4   0       4      
2   4   2   2   2   2       2
3   4   4   4   4   4       4
4   0   0   0   2   4       0
5   6   6   6   6   6       6
6   3   3   3   3   5       3
7   0   6   6   0   4      -1
8   3   3   3   3   4       3
9   2   2   4   3   3      -1

where -1 is printed when we have pair number of elements.

Thanks a lot. 

CodePudding user response:

Try, pd.Series.mode:

def f(x):
    result = x.mode()
    return result[0] if len(result) == 1 else -1

df['vote'] = df.T.apply(f)
print(df)

Output:

   Column1  Column2  Column3  Column4  Column5  vote
0        0        0        6        1        0     0
1        4        4        6        4        0     4
2        4        2        2        2        2     2
3        4        4        4        4        4     4
4        0        0        0        2        4     0
5        6        6        6        6        6     6
6        3        3        3        3        5     3
7        0        6        6        0        4    -1
8        3        3        3        3        4     3
9        2        2        4        2        2     2

CodePudding user response:

you can use mode and np.where():

import numpy as np
df['Voting']=np.where(df.mode(axis=1)[1].notnull(),-1,df.mode(axis=1)[0])
print(df)
'''
   Column1  Column2  Column3  Column4  Column5  Voting
0        0        0        6        1        0     0.0
1        4        4        6        4        0     4.0
2        4        2        2        2        2     2.0
3        4        4        4        4        4     4.0
4        0        0        0        2        4     0.0
5        6        6        6        6        6     6.0
6        3        3        3        3        5     3.0
7        0        6        6        0        4    -1.0
8        3        3        3        3        4     3.0
9        2        2        4        2        2     2.0
'''

CodePudding user response:

Using DataFrame.mode:

df.mode(axis=1).apply(lambda x: x.iloc[0] if x.isnull().any() else -1, axis=1)
  • Related