Home > Blockchain >  Python : How to return most occurrent value on each row depend on fix columns?
Python : How to return most occurrent value on each row depend on fix columns?

Time:11-28

I have a dataframe as below:

import pandas as pd
 
# intialise data of lists.
data = {'Name':['Tom', 'nick', 'krish', 'jack'],
        'Book1':[20, 21, 19, 18],
       'Book2':[20,'', 12, 20],
       'Book3':[31, 21, 17, 16],
       'Book4':[31, 19, 18, 16]}
 
# Create DataFrame
df = pd.DataFrame(data)
 
# Print the output.
print(df)
Name  Book1 Book2  Book3  Book4
Tom     20    20     31     31
nick    21           21     19
krish   19    12     17     18
jack    18    20     16     16

I wish to get below output which is comparing Book1, Book2, Book3 and Book4 column. For row Tom output, there are two 20 and two 31, since the number of the value is equal valent so in this case it will prefer the value come fist that is Book1, so the Output column is 20. For row nick, there was two number 21 and one number 19, so it will take the most occurrence number for output column which is number 21. While for krish row, there was no repeated number so the output column i want fix it as "Mix" .

Output column as below:

Name  Book1 Book2  Book3  Book4  Output
Tom     20    20     31     31     20
nick    21           21     19     21
krish   19    12     17     18     Mix
jack    18    20     16     16     16

Anyone have ideas? I saw there is mode function but it was not applicable for this case, please help, thanks

CodePudding user response:

Use value_counts:

max_val = lambda x: x.value_counts().index[0] \
                        if x.value_counts().iloc[0] > 1 else 'Mix'

df['Output'] = df.filter(like='Book').apply(max_val, axis=1)
print(df)

# Output:
    Name  Book1 Book2  Book3  Book4 Output
0    Tom     20    20     31     31     20
1   nick     21           21     19     21
2  krish     19    12     17     18    Mix
3   jack     18    20     16     16     16

Update

If you use Python >= 3.8, you can use the walrus operator (avoid a double call to value_counts:

max_val = lambda x: v.index[0] if (v := x.value_counts()).iloc[0] > 1 else 'Mix'
df['Output'] = df.filter(like='Book').apply(max_val, axis=1)

CodePudding user response:

We can use your idea on mode to get your desired output. First, we need to convert the relevant columns to numeric data types:

temp = (df
        .filter(like='Book')
        .apply(pd.to_numeric)
        .mode(1)
        )

# compute for values
# nulls exist  only if there are duplicates
output = np.where(temp.notna().all(1),
                  # value if True
                  'Mix',
                  # if False, pick the first modal value, 
                  temp.iloc[:, 0]) 

df.assign(output = output)

    Name  Book1 Book2  Book3  Book4 output
0    Tom     20    20     31     31   20.0
1   nick     21           21     19   21.0
2  krish     19    12     17     18    Mix
3   jack     18    20     16     16   16.0

  • Related