Home > Enterprise >  How to find the most frequent value of a column per row, where each column value is a list of values
How to find the most frequent value of a column per row, where each column value is a list of values

Time:08-19

I have a dataframe that, as a result of a previous group by, contains 5 rows and two columns. column A is a unique name, and column B contains a list of unique numbers that correspond to different factors related to the unique name. How can I find the most common number (mode) for each row?

df = pd.DataFrame({"A": [Name1,Name2,...], "B": [[3, 5, 6, 6], [1, 1, 1, 4],...]})

I have tried:

df['C'] = df[['B']].mode(axis=1)

but this simply creates a copy of the lists from column B. Not really sure how to access each list in this case.

Result should be:

  A:        B:       C:
Name 1  [3,5,6,6]    6
Name 2  [1,1,1,4]    1

Any help would be great.

CodePudding user response:

I would use Pandas' .apply() function here. It will execute a function on each element in a series. First, we define the function, I'm taking the mode from Find the most common element in a list

def mode(lst):
    return max(set(lst), key=lst.count)

Then, we apply this function to the B column to get C:

df['C'] = df['B'].apply(mode)

Our output is:

>>> df
       A             B  C
0  Name1  [3, 5, 6, 6]  6
1  Name2  [1, 1, 1, 4]  1

CodePudding user response:

Here's a method using statistics module's mode function

from statistics import mode

Two options:

df["C"] = df["B"].apply(mode)
df.head()
#   A        B              C
# 0 Name1   [3, 5, 6, 6]    6
# 1 Name2   [1, 1, 1, 4]    1

Or

df["C"] = [mode(df["B"][i]) for i in range(len(df))]
df.head()
#   A        B              C
# 0 Name1   [3, 5, 6, 6]    6
# 1 Name2   [1, 1, 1, 4]    1
  • Related