I am a newbie here. English is not my native language so excuse any grammatical mistakes. I need to compute the average BMI per hair colour using the df
.
# 1. Here we import pandas
import pandas as pd
# 2. Here we import numpy
import numpy as np
np.random.seed(0)
df = pd.DataFrame({'Age':[18, 21, 28, 19, 23, 22, 18, 24, 25, 20],
'Hair colour':['Blonde', 'Brown', 'Black', 'Blonde', 'Blonde', 'Black','Brown', 'Brown', 'Black', 'Black'],
'Length (in cm)':np.random.normal(175, 10, 10).round(1),
'Weight (in kg)':np.random.normal(70, 5, 10).round(1)},
index = ['Leon', 'Mirta', 'Nathan', 'Linda', 'Bandar', 'Violeta', 'Noah', 'Niji', 'Lucy', 'Mark'],)
I should get vectors with names.
Firstly, I wrote the function of BMI:
# function
def BMI():
df['weight (in kg)'] / (df['Length']/100)**2
However, I don't know what my next step is.
Can you advise me on how to find the average BMI per hair colour?
CodePudding user response:
You can use df.groupby()
which is a functionality within Pandas
For your particular case, you may use
df.groupby('Hair colour').mean()['BMI']
which gives output
Hair colour
Black 23.003356
Blonde 18.806844
Brown 23.271460
Name: BMI, dtype: float64
CodePudding user response:
You can either filter or groupby.
Your BMI
function does not make sense as you are:
- referencing columns that do not exist
- do nothing with its return so it gets discarded
Filtering:
import pandas as pd
import numpy as np
np.random.seed(0)
df = pd.DataFrame({'Age':[18, 21, 28, 19, 23, 22, 18, 24, 25, 20],
'Hair colour':['Blonde', 'Brown', 'Black', 'Blonde',
'Blonde', 'Black','Brown', 'Brown', 'Black',
'Black'],
'Length (in cm)':np.random.normal(175, 10, 10).round(1),
'Weight (in kg)':np.random.normal(70, 5, 10).round(1)},
index = ['Leon', 'Mirta', 'Nathan', 'Linda', 'Bandar',
'Violeta', 'Noah', 'Niji', 'Lucy', 'Mark'],)
print(df)
# calculate BMI - not as function, using correct column names
df["BMI"] = df['Weight (in kg)'] / (df['Length (in cm)']/100)**2
print(df)
# filter to brown
brown = df[df["Hair colour"] == "Brown"]
print(brown)
print(brown["BMI"].mean())
Output:
# calculated BMI
Age Hair colour Length (in cm) Weight (in kg) BMI
Leon 18 Blonde 192.6 70.7 19.059296
Mirta 21 Brown 179.0 77.3 24.125339
Nathan 28 Black 184.8 73.8 21.609884
Linda 19 Blonde 197.4 70.6 18.118006
Bandar 23 Blonde 193.7 72.2 19.243229
Violeta 22 Black 165.2 71.7 26.272359
Noah 18 Brown 184.5 77.5 22.767165
Niji 24 Brown 173.5 69.0 22.921875
Lucy 25 Black 174.0 71.6 23.649095
Mark 20 Black 179.1 65.7 20.482087
# filtered output
Age Hair colour Length (in cm) Weight (in kg) BMI
Mirta 21 Brown 179.0 77.3 24.125339
Noah 18 Brown 184.5 77.5 22.767165
Niji 24 Brown 173.5 69.0 22.921875
# avg BMI
23.271459786871446
Groupby:
# use groupby
grouped = df.groupby('Hair colour')
print(*grouped, sep="\n\n")
# https://stackoverflow.com/questions/51091331
print(grouped.get_group("Brown")["BMI"].mean())
Output:
# grouped output
('Black', Age Hair colour Length (in cm) Weight (in kg) BMI
Nathan 28 Black 184.8 73.8 21.609884
Violeta 22 Black 165.2 71.7 26.272359
Lucy 25 Black 174.0 71.6 23.649095
Mark 20 Black 179.1 65.7 20.482087)
('Blonde', Age Hair colour Length (in cm) Weight (in kg) BMI
Leon 18 Blonde 192.6 70.7 19.059296
Linda 19 Blonde 197.4 70.6 18.118006
Bandar 23 Blonde 193.7 72.2 19.243229)
('Brown', Age Hair colour Length (in cm) Weight (in kg) BMI
Mirta 21 Brown 179.0 77.3 24.125339
Noah 18 Brown 184.5 77.5 22.767165
Niji 24 Brown 173.5 69.0 22.921875)
# avg BMI
23.271459786871446