Home > Software engineering >  Using GroupBy to group and filter data from excel
Using GroupBy to group and filter data from excel

Time:08-05

pets.xlsx

Im trying to group rows of data in a .xlsx file(see image) by Animal and then get a list of Pet Names that meet one condition: Age(yrs) is greater than the mean Age(yrs) of Animal group. Current Code: import pandas as pd df = pd.read_excel('pets.xlsx')

above_avg = []
animal = df.groupby('Animal')
avg_age = animal.mean()

for index, row in df.iterrows():
    if row['Age(yrs)'] > avg:
        above_avg.append(row['Pet Name'])
print(above_avg)

Output (error):

if row['Age(yrs)'] > avg:
f"The truth value of a {type(self).__name__} is ambiguous. "
ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()

How can I correct this error in my code so that I get the desired output?:

['Dusty', 'Clifford', 'Shelly']

CodePudding user response:

Your method is correct but you 're not using the proper column for calculating mean. This should fix the problem

above_avg = []
animal = df.groupby('Animal')
avg_age = animal['Age(yrs)'].mean()

for index, row in df.iterrows():
    if row['Age(yrs)'] > avg_age[row['Animal']]:
        above_avg.append(row['Pet Name'])
print(above_avg)
  • Related