I have a data frame and I want to create a new column of specific words that are in a particular column using pandas. In this example: I have a certain text and a list of words I would like to locate those specific words in a new column.
import re
txt = df['text']
x = re.findall(("apple|banana|orange"), txt)
print(x)
TypeError: expected string or bytes-like object
It is important to note that there are empty cells in the text column
CodePudding user response:
You could use pandas Series.str.contains to filter the df and pass the results to a new columns:
import pandas as pd
df = pd.DataFrame([["a"], ["applea"], ["bana"], ["bananak"], ["banana"], ["orange"]],columns=["fruits"])
df["new"] = df[df["fruits"].str.contains(pat=r"banana|apple|orange")]
>>> df
fruits new
0 a NaN
1 applea applea
2 bana NaN
3 bananak bananak
4 banana banana
5 orange orange
CodePudding user response:
In your case do
df['text'].str.findall("apple|banana|orange")
CodePudding user response:
Assuming you have a list:
lst = ['Apple', 'Orange', 'Banana']
I would do:
for i, row in df.iterrows():
for item in lst:
if item in row['txt']:
df.loc[i, 'newcol'] = item
Maybe a little convoluted, but that works!