Compare values per row-CodePudding

For building an ensemble model I want to create a table with all results of a classification. Next I want to calculate per row the amount of different values and find the most frequent value.

Let's say the initial table looks like:

 ---- -------- -------- -------- 
|    |   col1 |   col2 |   col3 |
|---- -------- -------- --------|
|  0 |      1 |      2 |      3 | <- 3 different values, no most frequent one, take largest (3)
|  1 |      2 |      2 |      2 | <- 1 value, 2 is most frequent
|  2 |      3 |      2 |      2 | <- 2 values, 2 is most frequent
 ---- -------- -------- --------

If there is no most frequent one, like in this example in row 0, it should take the largest one - in this example it would be 3.

Final result should look like:

 ---- -------- -------- -------- -------------------- ----------------- 
|    |   col1 |   col2 |   col3 |   different_values |   most_frequent |
|---- -------- -------- -------- -------------------- -----------------|
|  0 |      1 |      2 |      3 |                  3 |               3 |
|  1 |      2 |      2 |      2 |                  1 |               2 |
|  2 |      3 |      2 |      2 |                  2 |               2 |
 ---- -------- -------- -------- -------------------- -----------------

I know how to solve it column by column, but I'm struggling with row by row.

MWE

Data:

import pandas as pd

df = pd.DataFrame({
    "col1":[1,2,3],
    "col2":[2,2,2],
    "col3":[3,2,2]
})

Result:

df["different_values"] = [3,1,2]
df["most_frequent"] = [3, 2, 2]

CodePudding user response：

Check nunqiue and mode

df["most_frequent"] = df.mode(axis=1) # when there is only one most freq value return
#df.mode(axis=1).max(1) #if there is more than one same freq value
#df.mode(axis=1).min(1) # for get the smallest 

df["different_values"] = df.nunique(axis=1)
df
Out[73]: 
   col1  col2  col3  different_values  most_frequent
0     1     2     3                 3              3
1     2     2     2                 1              2
2     3     2     2                 2              2