It's days that i'm struggling on this piece of code, i thought i should give it a try here.
I have a DataFrame with some null values that I want to substitute with mean values that I have in other DataFrame. I've created a function that it should later be implemented with a lambda but I keep getting an error.
I Have a DataFrame like this:
CustomerType | Category | Satisfaction | Age |
---|---|---|---|
Not Premium | Electronics | Not Satisfied | NaN |
Not Premium | Beauty | Satisfied | NaN |
Premium | Sports | Satisfied | 38.0 |
Not Premium | Sports | Not Satisfied | NaN |
That i need to fill with this data:
CustomerType | Satisfaction | Age |
---|---|---|
Not Premium | Not Satisfied | 32.440740 |
Not Premium | Satisfied | 28.896348 |
Premium | Not Satisfied | 43.767723 |
Premium | Satisfied | 44.075901 |
So I've created a function:
def fill_age(x):
if x.isnull()== True:
return[(grp.CustomerType==x.CustomerType) | (grp.Satisfaction==x.Satisfaction)]['Age'].values[0]
That I would like to apply to my dataframe using a lambda function to iterate through all the rows:
df['Age'] = [df.apply(lambda x: fill_age(x) if np.isnan(x['Age']) else
x['Age'], axis=1) for x in df]
But i keep getting this error:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Can anyone of you help me? Thank you so much!!
CodePudding user response:
Supposing that you are calling incorrectly apply
in your DataFrame
and that fill_age()
are working correctly on df["Age"]
values, you need to replace this statement, just to evaluate x
and asign a determined value (current Age or to be replace with external data) then checking by else-if
conditional, this code shouldn't return errors
df["Age"] = df["Age"].apply(lambda x: fill_age(x) if np.isnan(x) else x)
CodePudding user response:
We should try avoid use apply
, so we could use instead:
df['Age'] = df['Age'].fillna(
df.groupby(['CustomerType', 'Satisfaction'])['Age'].transform('first')
)