Home > Blockchain >  Add new colums with regex
Add new colums with regex

Time:10-24

I have a data frame and I want to create a new column of specific words that are in a particular column using pandas. In this example: I have a certain text and a list of words I would like to locate those specific words in a new column.

enter image description here

import re

txt = df['text']
x = re.findall(("apple|banana|orange"), txt)
print(x) 

TypeError: expected string or bytes-like object

It is important to note that there are empty cells in the text column

CodePudding user response:

You could use pandas Series.str.contains to filter the df and pass the results to a new columns:

import pandas as pd

df = pd.DataFrame([["a"], ["applea"], ["bana"], ["bananak"], ["banana"], ["orange"]],columns=["fruits"])
df["new"] = df[df["fruits"].str.contains(pat=r"banana|apple|orange")]

>>> df
    fruits      new
0        a      NaN
1   applea   applea
2     bana      NaN
3  bananak  bananak
4   banana   banana
5   orange   orange

CodePudding user response:

In your case do

df['text'].str.findall("apple|banana|orange")

CodePudding user response:

Assuming you have a list:

lst = ['Apple', 'Orange', 'Banana']

I would do:

    for i, row in df.iterrows():
        for item in lst:
            if item in row['txt']:
                df.loc[i, 'newcol'] = item

Maybe a little convoluted, but that works!

  • Related