Home > Software design >  Pandas masking the column replaces non matched value to NaN
Pandas masking the column replaces non matched value to NaN

Time:07-07

Say I have a dataframe

d = {
    "cid": [1, 3, 3],
    "txt": ["Kakashi is GOAT", "Eleven is El", "Hello agian"],
     "anime": ['Kakashi ', 'el', 'mouse']
}

df = pd.DataFrame(data=d)
df['code'] = df['anime'].astype('category').cat.codes

I need to create a new column code which will contain value from code if anime is present in txt else 999

This is my mask

mask = df.apply(lambda x: x.anime.lower() in x.txt.lower(), axis=1)
df['newCol'] = 999
df['newCol'] = df.loc[mask, 'code']

but this gives me float values and the replaces the 999 as NaN

Output :

df
   cid              txt     anime  code  newCol
0    1  Kakashi is GOAT  Kakashi      0     0.0
1    3     Eleven is El        el     1     1.0
2    3      Hello agian     mouse     2     NaN

Expected :

df
   cid              txt     anime    code  newCol
0    1  Kakashi is GOAT    Kakashi      0     0
1    3     Eleven is El      el         1     1
2    3      Nothing         mouse       2     999

Note that I need to do it the masking way there are various methods in which this can be achieved though

CodePudding user response:

You could skip the mask step and just assign the newCol values using (almost) the same lambda function:

df['newCol'] = df.apply(lambda x: x.code if x.anime.lower() in x.txt.lower() else 999, axis=1)

Output:

   cid              txt     anime  code  newCol
0    1  Kakashi is GOAT  Kakashi      0       0
1    3     Eleven is El        el     1       1
2    3      Hello agian     mouse     2     999

Note that el matches against Eleven and El in the string Eleven is El. If you want to avoid that, you should use a regex match instead, using word boundaries to ensure el can only match El, not Eleven:

import re

df['newCol'] = df.apply(lambda x: x.code if re.search(rf'\b{x.anime}\b', x.txt, re.I) else 999, axis=1)

The output is the same. Note that by using the re.I (case-insensitive) flag, we can avoid the need to convert both strings to lower-case.

  • Related