Home > Net >  Pandas DataFrame: Print out the most common color among dogs
Pandas DataFrame: Print out the most common color among dogs

Time:04-10

enter image description here

I'm given the following DataFrame, and I want get the most common color among dogs. I have started by making this table in python as a DataFrame:

import pandas as pd

table = {
  "age" : [4,3,6,4,2,5],
  "color" : ["brown","grey","golden","grey","black","brown"],
  "fur" : ["shaggy","short","curly","shaggy","curly","shaggy"],
  "name" : ["odie","gabe","samosa","gabe","bob marker","odie"]
};

df = pd.DataFrame(table);

Then I have grouped by column color and counted the amount of each occuring color in descending order, such that the first row has the most occuring / common color among dogs.

df = df.groupby("color").color.agg(["count"]).sort_values(["count"],ascending=False);

At this point, the DataFrame df looks like that:

        count
color        
brown       2
grey        2
black       1
golden      1

Now to actually print out the most common color among dogs, I have done this:

print(df[df["count"]==df["count"].max()].index);

Where index is the color.

Final output:

Index(['brown', 'grey'], dtype='object', name='color')

Would you say the way I print it out is correct to the question? And isn't there an easier way to do this with pandas? It feels like I did it too complicated and it can be done faster :/

CodePudding user response:

You could use value_counts:

counts = df['color'].value_counts()
out = counts[counts==counts.max()].index.tolist()

You could use pipe to chain these two lines into a one-liner:

out = df['color'].value_counts().pipe(lambda x: x[x==x.max()].index).tolist()

Output:

['brown', 'grey']
  • Related