I am trying to apply a function, that calculates a max value, over a list of ids and save them in one file using another function. Is this right way to do it? Because I am getting redundant results.
data1
animals_age1 = pd.DataFrame({'Animal': ['Falcon', 'Falcon','Falcon', 'Falcon', 'Falcon'],
'Age': [10, 20, 30, 40, 50]})
function1 (calculates max)
def function_1(df):
df = df[df.Age >=0]
return df.groupby(['Animal'])\
.apply(lambda x:pd.Series({'Age_max':x.Age.max()})).reset_index()
data2
animals_age2 = pd.DataFrame({'Animal': ['Falcon', 'Falcon','Falcon', 'Falcon', 'Falcon',
'Parrot', 'Parrot','Parrot', 'Parrot', 'Parrot'],
'Age': [10, 20, 30, 40, 50, 10, 20, 30, 40, 60]})
function2 (calculates max for a list of unique ids)
def function_2(df):
results = []
for id in df['Animal'].unique():
results.append(function_1(df))
results = pd.concat(results, axis=0)
return results
CodePudding user response:
Call function for both DataFrames separately, function aggregate by Animal
, so not necessary looping by unique values of column Animal
:
def function_1(df):
return df[df.Age >=0].groupby('Animal', as_index=False).agg(Age_max=('Age','max'))
df1 = function_1(animals_age1)
print (df1)
Animal Age_max
0 Falcon 50
df1 = function_1(animals_age2)
print (df1)
Animal Age_max
0 Falcon 50
1 Parrot 60
EDIT:
If really need second function filter column Animal
by unique value id
:
def function_2(df):
results = []
for id in df['Animal'].unique():
results.append(function_1(df[df['Animal'].eq(id)]))
results = pd.concat(results, axis=0)
return results
df2 = function_2(animals_age2)