Home > Enterprise >  How to check if a certain df['column'] contains a word from a list Python?
How to check if a certain df['column'] contains a word from a list Python?

Time:09-21

Any tips on how to see if a certain word is inside pandas column?

# initialise data of lists.
data = {'Colour':['Blue andtext', 'Greys', 'Potato', 'Yellow','Tree'],
        'Values':[20, 21, 19, 18,44]}
 
df2 = pd.DataFrame(data)

Let's say:

colours = ['Blue','Grey','Yellow']

How do I check if df2['Colour'] is actually a colour and represent it in a new column?

Output should be

Colour           Value   Actualcolour
Blue andtext     20      Blue
Greys            21      Grey
Potato           19      NaN
Yellow           18      Yellow
Tree             44      Nan

CodePudding user response:

How about this?

df2['ActualColour'] = [x if x in colours else np.NaN for x in df2.Colour]

You could also convert colours into a df and left join them

CodePudding user response:

Use pd.Series.where and isin:

df2["Actualcolour"] = df2["Colour"].where(df2["Colour"].isin(colours))

print (df2)

   Colour  Values Actualcolour
0    Blue      20         Blue
1    Grey      21         Grey
2  Potato      19          NaN
3  Yellow      18       Yellow
4    Tree      44          NaN

Or use pd.Series.extract and add word boundary or ignore case if required:

df2["ActualColor"] = df2["Colour"].str.extract(f"({'|'.join(colours)})")

print (df2)

         Colour  Values ActualColor
0  Blue andtext      20        Blue
1         Greys      21        Grey
2        Potato      19         NaN
3        Yellow      18      Yellow
4          Tree      44         NaN

CodePudding user response:

Another possible answer is to apply a function to the column in question using .apply:

df2["ActualColour"] = df2["Colour"].apply(lambda x: x if x in colours else np.nan)
  • Related