Filling Missing Values Based on String Condition-CodePudding

I'm trying to write a function to impute some null values from a Numeric column based on string conditions from a Text column.

My attempt example:

def fill_nulls(string, val):
if df['TextColumn'].str.contains(string) == True:
    df['NumericColumn'] = df['NumericColumn'].fillna(value=val)

The 'string' and 'val' parameters are manually entered. I tried applying function to my numeric column but it gives me this error:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

I tried to find examples that I could tweak for my situation, but they all involved using 'groupby' to get the average numeric values relating to the discrete string values that had only a handful of unique values. Basically, only exact wording could be imputed, whereas I'm trying to generalize my string filtering by using partial strings and imputing the null values in the numeric column based on the resulting rows of the text columns.

CodePudding user response：

You can use

x = df['TextColumn'].map(lambda x: x.contains(string))
df['NumericColumn'][x] = df['NumericColumn'][x].fillna(value=val)

First you generate the list of elements you want to replace with the map, then use that list to replace elements you want to replace.

edit: fixed typo in code

CodePudding user response：

Use Series.str.contains with DataFrame.loc:

m = df['TextColumn'].str.contains(string)
df.loc[m, 'NumericColumn'] = df.loc[m, 'NumericColumn'].fillna(value=val)

Or chain conditions by & for bitwise AND for test missing values by Series.isna and assign value in DataFrame.loc:

m1 = df['TextColumn'].str.contains(string)
m2 = df['NumericColumn'].isna()
df.loc[m1 & m2, 'NumericColumn'] = val