I am trying to get the most frequent value for each variable in a dataset in python. For example, I want to know the most frequent preferred color for a person per city.
data = {'Name':['Tom', 'nick', 'krish', 'jack', 'John', 'Bettany', 'Leo', 'Aubrie', 'Martha', 'Grant'],
'Age':[20, 21, 19, 18,24,25,26,26,27, 25],
'Prefered color':['green', 'green', 'red', 'blue', 'white', 'black', 'green', 'blue', 'red', 'white'],
'state':['Utah', 'Utah', 'Idaho', 'California', 'Texas', 'Arizona', 'Idaho', 'California', 'Idaho', 'Texas'] }
df = pd.DataFrame(data)
df
I would like to see a table like this:
Utah - Green
Idaho - Red
Texas - White
Arizona - Blue
CodePudding user response:
Try with groupby
and mode
. Since a series can have multiple modes, you can concat:
>>> df.groupby("state")["Prefered color"].agg(lambda x: x.mode().str.cat(sep=","))
state
Arizona black
California blue,red
Idaho blue,green,red
Texas white
Utah green
Name: Prefered color, dtype: object