Add new colums with regex-CodePudding

I have a data frame and I want to create a new column of specific words that are in a particular column using pandas. In this example: I have a certain text and a list of words I would like to locate those specific words in a new column.

enter image description here

import re

txt = df['text']
x = re.findall(("apple|banana|orange"), txt)
print(x)

TypeError: expected string or bytes-like object

It is important to note that there are empty cells in the text column

CodePudding user response：

You could use pandas Series.str.contains to filter the df and pass the results to a new columns:

import pandas as pd

df = pd.DataFrame([["a"], ["applea"], ["bana"], ["bananak"], ["banana"], ["orange"]],columns=["fruits"])
df["new"] = df[df["fruits"].str.contains(pat=r"banana|apple|orange")]

>>> df
    fruits      new
0        a      NaN
1   applea   applea
2     bana      NaN
3  bananak  bananak
4   banana   banana
5   orange   orange

CodePudding user response：

In your case do

df['text'].str.findall("apple|banana|orange")

CodePudding user response：

Assuming you have a list:

lst = ['Apple', 'Orange', 'Banana']

I would do:

    for i, row in df.iterrows():
        for item in lst:
            if item in row['txt']:
                df.loc[i, 'newcol'] = item

Maybe a little convoluted, but that works!