Home > Blockchain >  Get the most frequent value of several variables
Get the most frequent value of several variables

Time:05-18

I am trying to get the most frequent value for each variable in a dataset in python. For example, I want to know the most frequent preferred color for a person per city.

data = {'Name':['Tom', 'nick', 'krish', 'jack', 'John', 'Bettany', 'Leo', 'Aubrie', 'Martha', 'Grant'],
        'Age':[20, 21, 19, 18,24,25,26,26,27, 25], 
        'Prefered color':['green', 'green', 'red', 'blue', 'white', 'black', 'green', 'blue', 'red', 'white'], 
        'state':['Utah', 'Utah', 'Idaho', 'California', 'Texas', 'Arizona', 'Idaho', 'California', 'Idaho', 'Texas'] }
df = pd.DataFrame(data)
df

I would like to see a table like this:

Utah - Green 
Idaho - Red
Texas - White
Arizona - Blue

CodePudding user response:

Try with groupby and mode. Since a series can have multiple modes, you can concat:

>>> df.groupby("state")["Prefered color"].agg(lambda x: x.mode().str.cat(sep=","))
state
Arizona                black
California          blue,red
Idaho         blue,green,red
Texas                  white
Utah                   green
Name: Prefered color, dtype: object
  • Related