Any tips on how to see if a certain word is inside pandas column?
# initialise data of lists.
data = {'Colour':['Blue andtext', 'Greys', 'Potato', 'Yellow','Tree'],
'Values':[20, 21, 19, 18,44]}
df2 = pd.DataFrame(data)
Let's say:
colours = ['Blue','Grey','Yellow']
How do I check if df2['Colour'] is actually a colour and represent it in a new column?
Output should be
Colour Value Actualcolour
Blue andtext 20 Blue
Greys 21 Grey
Potato 19 NaN
Yellow 18 Yellow
Tree 44 Nan
CodePudding user response:
How about this?
df2['ActualColour'] = [x if x in colours else np.NaN for x in df2.Colour]
You could also convert colours
into a df and left join them
CodePudding user response:
Use pd.Series.where
and isin
:
df2["Actualcolour"] = df2["Colour"].where(df2["Colour"].isin(colours))
print (df2)
Colour Values Actualcolour
0 Blue 20 Blue
1 Grey 21 Grey
2 Potato 19 NaN
3 Yellow 18 Yellow
4 Tree 44 NaN
Or use pd.Series.extract
and add word boundary or ignore case if required:
df2["ActualColor"] = df2["Colour"].str.extract(f"({'|'.join(colours)})")
print (df2)
Colour Values ActualColor
0 Blue andtext 20 Blue
1 Greys 21 Grey
2 Potato 19 NaN
3 Yellow 18 Yellow
4 Tree 44 NaN
CodePudding user response:
Another possible answer is to apply a function to the column in question using .apply
:
df2["ActualColour"] = df2["Colour"].apply(lambda x: x if x in colours else np.nan)