I am a newbie here. English is not my native language, so excuse any grammatical mistakes. I'm trying to compute the mean age of blonde people from the data in df
:
np.random.seed(0)
df = pd.DataFrame({'Age':[18, 21, 28, 19, 23, 22, 18, 24, 25, 20],
'Hair colour':['Blonde', 'Brown', 'Black', 'Blonde', 'Blonde', 'Black','Brown', 'Brown', 'Black', 'Black'],
'Length (in cm)':np.random.normal(175, 10, 10).round(1),
'Weight (in kg)':np.random.normal(70, 5, 10).round(1)},
index = ['Leon', 'Mirta', 'Nathan', 'Linda', 'Bandar', 'Violeta', 'Noah', 'Niji', 'Lucy', 'Mark'],)
I need to get the one number.
Firstly, I attempted to use the "df.divide".
# 1. Here we import pandas
import pandas as pd
# 2. Here we import numpy
import numpy as np
ans_3 = df({'Age'}).divide(df({'Hair colour': ['Blonde']}))
However, I have got this TypeError: 'DataFrame' object is not callable.
What should I do for working my code that I'll get the appropriate result?
CodePudding user response:
You get this error because you use df(..)
. This is the python syntax to call a function. You probably want df[..]
instead.
To answer your question:
(
df # given your data
[df["Hair colour"] == "Blonde"] # only look at blonde people
["Age"] # for those in the Age column
.mean() # and compute the mean
)
CodePudding user response:
Run:
df[df['Hair colour'] == 'Blonde'].Age.mean()
Details:
df['Hair colour'] == 'Blonde'
- generates a Series of bool type, stating whether the current row has Blonde hair.df[…]
- get rows meeting the above condition.Age
- from the above rows take only Age column.mean()
- compute the mean age.
CodePudding user response:
As it has been pointed out, the errors arises since you are using parenthesis that are made to call a callable object as a function for exemple. Instead you should use brackets that are make for slicing and select data.
As an advice I would suggest you to use the groupby method to check population statistics. Here if you want to know the mean value of your observables a function of the Hair color you can do :
df.groupby("Hair colour").mean()
that would return you the following
Hair colour | Age | Length (in cm) | Weight (in kg) |
---|---|---|---|
Black | 23.75 | 175.775 | 70.7 |
Blonde | 20.0 | 194.5666666666667 | 71.16666666666667 |
Brown | 21.0 | 179.0 | 74.60000000000001 |
you can thus see that the average age for Blonde people is 20.
If you want to retrieve this particular value you can do :
df.groupby("Hair colour").mean()["Age"]["Blonde"]