Home > Net >  How to groupby().transform() to find mode in dataframe?
How to groupby().transform() to find mode in dataframe?

Time:07-31

I have a dataframe like:

lst = [["High", "A"], ["High", "A"], ["High", "B"],["Medium", "A"], ["Medium", "B"], ["Medium", "C"]]

df = pd.DataFrame(lst, columns =["Class", "Grade"])

I need to get the mode (majority vote) of "Grade" in each "Class". If it's a tie vote, assign "x".

Below is what I expect to get:

Class Grade Majority_vote
High A A
High A A
High B A
Medium A x
Medium B x
Medium C x

This is my code:

df['majority_vote'] = df.groupby(['Class'])['Grade'].transform(lambda x: x.mode()[0])

I think the code will return 'nan' if it's a tie vote. Then, I will change 'nan' to 'x' later.

However, what I get is below:

Class Grade Majority_vote
High A A
High A A
High B A
Medium A A
Medium B A
Medium C A

At class "Medium", the code returns the 1st element ("A") instead of 'nan'.

Any other method is appreciated. Could you please help me? Thank you in advance.

CodePudding user response:

The issue with using x.mode()[0] is that pd.Series(['A', 'B', 'C']).mode() evaluates to ['A', 'B', 'C']. Meanwhile, pd.Series(['A', 'A', 'B']).mode() evaluates to ['A'].

Here is a function that will return the mode (if there is only one) and "x" if there is a tie (i.e., multiple modes).

import pandas as pd
lst = [["High", "A"], ["High", "A"], ["High", "B"],["Medium", "A"], ["Medium", "B"], ["Medium", "C"]]
df = pd.DataFrame(lst, columns=["Class", "Grade"])

def get_mode_or_x(series):
    mode = series.mode()
    if mode.size == 1:
        return mode[0]
    return "x"

df.loc[:, "majority_vote"] = df.groupby("Class")["Grade"].transform(get_mode_or_x)
index Class Grade majority_vote
0 High A A
1 High A A
2 High B A
3 Medium A x
4 Medium B x
5 Medium C x
  • Related