Pandas masking the column replaces non matched value to NaN-CodePudding

Say I have a dataframe

d = {
    "cid": [1, 3, 3],
    "txt": ["Kakashi is GOAT", "Eleven is El", "Hello agian"],
     "anime": ['Kakashi ', 'el', 'mouse']
}

df = pd.DataFrame(data=d)
df['code'] = df['anime'].astype('category').cat.codes

I need to create a new column code which will contain value from code if anime is present in txt else 999

This is my mask

mask = df.apply(lambda x: x.anime.lower() in x.txt.lower(), axis=1)
df['newCol'] = 999
df['newCol'] = df.loc[mask, 'code']

but this gives me float values and the replaces the 999 as NaN

Output :

df
   cid              txt     anime  code  newCol
0    1  Kakashi is GOAT  Kakashi      0     0.0
1    3     Eleven is El        el     1     1.0
2    3      Hello agian     mouse     2     NaN

Expected :

df
   cid              txt     anime    code  newCol
0    1  Kakashi is GOAT    Kakashi      0     0
1    3     Eleven is El      el         1     1
2    3      Nothing         mouse       2     999

Note that I need to do it the masking way there are various methods in which this can be achieved though

CodePudding user response：

You could skip the mask step and just assign the newCol values using (almost) the same lambda function:

df['newCol'] = df.apply(lambda x: x.code if x.anime.lower() in x.txt.lower() else 999, axis=1)

Output:

   cid              txt     anime  code  newCol
0    1  Kakashi is GOAT  Kakashi      0       0
1    3     Eleven is El        el     1       1
2    3      Hello agian     mouse     2     999

Note that el matches against Eleven and El in the string Eleven is El. If you want to avoid that, you should use a regex match instead, using word boundaries to ensure el can only match El, not Eleven:

import re

df['newCol'] = df.apply(lambda x: x.code if re.search(rf'\b{x.anime}\b', x.txt, re.I) else 999, axis=1)

The output is the same. Note that by using the re.I (case-insensitive) flag, we can avoid the need to convert both strings to lower-case.