Home > front end >  Create a function iterating in Panda's Dataframe rows to replace null values
Create a function iterating in Panda's Dataframe rows to replace null values

Time:03-08

It's days that i'm struggling on this piece of code, i thought i should give it a try here.

I have a DataFrame with some null values that I want to substitute with mean values that I have in other DataFrame. I've created a function that it should later be implemented with a lambda but I keep getting an error.


I Have a DataFrame like this:
CustomerType Category Satisfaction Age
Not Premium Electronics Not Satisfied NaN
Not Premium Beauty Satisfied NaN
Premium Sports Satisfied 38.0
Not Premium Sports Not Satisfied NaN

That i need to fill with this data:

CustomerType Satisfaction Age
Not Premium Not Satisfied 32.440740
Not Premium Satisfied 28.896348
Premium Not Satisfied 43.767723
Premium Satisfied 44.075901

So I've created a function:

def fill_age(x):
if x.isnull()== True:
    return[(grp.CustomerType==x.CustomerType) | (grp.Satisfaction==x.Satisfaction)]['Age'].values[0]

That I would like to apply to my dataframe using a lambda function to iterate through all the rows:

df['Age'] = [df.apply(lambda x: fill_age(x) if np.isnan(x['Age']) else 
                                            x['Age'], axis=1) for x in df]

But i keep getting this error:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().


Can anyone of you help me? Thank you so much!!

CodePudding user response:

Supposing that you are calling incorrectly apply in your DataFrame and that fill_age() are working correctly on df["Age"] values, you need to replace this statement, just to evaluate x and asign a determined value (current Age or to be replace with external data) then checking by else-if conditional, this code shouldn't return errors

df["Age"] = df["Age"].apply(lambda x: fill_age(x) if np.isnan(x) else x)

CodePudding user response:

We should try avoid use apply, so we could use instead:

df['Age'] = df['Age'].fillna(
    df.groupby(['CustomerType', 'Satisfaction'])['Age'].transform('first')
)
  • Related