TypeError: "DataFrame' object is not callable"-CodePudding

I am a newbie here. English is not my native language, so excuse any grammatical mistakes. I'm trying to compute the mean age of blonde people from the data in df:

    np.random.seed(0)
df = pd.DataFrame({'Age':[18, 21, 28, 19, 23, 22, 18, 24, 25, 20],
                   'Hair colour':['Blonde', 'Brown', 'Black', 'Blonde', 'Blonde', 'Black','Brown', 'Brown', 'Black', 'Black'],
                   'Length (in cm)':np.random.normal(175, 10, 10).round(1),
                   'Weight (in kg)':np.random.normal(70, 5, 10).round(1)},
                index = ['Leon', 'Mirta', 'Nathan', 'Linda', 'Bandar', 'Violeta', 'Noah', 'Niji', 'Lucy', 'Mark'],)

I need to get the one number.

Firstly, I attempted to use the "df.divide".

    # 1. Here we import pandas
import pandas as pd
# 2. Here we import numpy
import numpy as np
ans_3 = df({'Age'}).divide(df({'Hair colour': ['Blonde']}))

However, I have got this TypeError: 'DataFrame' object is not callable.

What should I do for working my code that I'll get the appropriate result?

CodePudding user response：

You get this error because you use df(..). This is the python syntax to call a function. You probably want df[..] instead.

To answer your question:

(
    df  # given your data
      [df["Hair colour"] == "Blonde"]  # only look at blonde people
    ["Age"]  # for those in the Age column
    .mean()  # and compute the mean
)

CodePudding user response：

Run:

df[df['Hair colour'] == 'Blonde'].Age.mean()

Details:

df['Hair colour'] == 'Blonde' - generates a Series of bool type, stating whether the current row has Blonde hair.
df[…] - get rows meeting the above condition.
Age - from the above rows take only Age column.
mean() - compute the mean age.

CodePudding user response：

As it has been pointed out, the errors arises since you are using parenthesis that are made to call a callable object as a function for exemple. Instead you should use brackets that are make for slicing and select data.

As an advice I would suggest you to use the groupby method to check population statistics. Here if you want to know the mean value of your observables a function of the Hair color you can do :

df.groupby("Hair colour").mean()

that would return you the following

Hair colour	Age	Length (in cm)	Weight (in kg)
Black	23.75	175.775	70.7
Blonde	20.0	194.5666666666667	71.16666666666667
Brown	21.0	179.0	74.60000000000001

you can thus see that the average age for Blonde people is 20.

If you want to retrieve this particular value you can do :

df.groupby("Hair colour").mean()["Age"]["Blonde"]