Home > Back-end >  Python loop to search multiple sets of keywords in all columns of dataframe
Python loop to search multiple sets of keywords in all columns of dataframe

Time:09-02

I've used the code below to search across all columns of my dataframe to see if each row has the word "pool" and the words "slide" or "waterslide".

AR11AR11_regex = r"""
(?=.*(?:slide|waterslide)).*pool
"""
f = lambda x: x.str.findall(AR_regex, flags= re.VERBOSE|re.IGNORECASE)
d['AR'][AR11] = d['AR'].astype(str).apply(f).any(1).astype(int)

This has worked fine but when I want to write a for loop to do this for more than one regex pattern (e.g., AR11, AR12, AR21) using the code below, the new columns are all zeros (i.e., the search is not finding any hits)

for i in AR_list:
    print(i)
    pat = i "_regex"
    print(pat)
    f = lambda x: x.str.findall(i "_regex", flags= re.VERBOSE|re.IGNORECASE)
    d['AR'][str(i)] = d['AR'].astype(str).apply(f).any(1).astype(int)

Any advice on why this loop didn't work would be much appreciated!

CodePudding user response:

Could you maybe post some info on the contents of your dataframe (what does the data inside look like? Does print pat output the correct pattern your are trying to search for? Perhaps the problem is also somewhere in your last line, where you do quite a few conversions in one line, an error at one point may lead to a zero when converted further

CodePudding user response:

A small sample data frame would help understand your question. In any case, your code sample appears to have a multitude of problems.

  1. i "_regex" is just the string "AR11_regex". It won't evaluate to the value of the variable with the identifier AR11_regex. Put your regex patterns in a dict.

  2. d['AR'] is the values in the AR column. It seems like you expect it to be a row.

  3. d['AR'][str(i)] is adding a new row. It seems like you want to add a new column.

  4. Lastly, this approach to setting a cell generally (always for me) yields the following warning: /var/folders/zj/pnrcbb6n01z2qv1gmsk70b_m0000gn/T/ipykernel_13985/876572204.py:2: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame

    See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

The suggest approach would be to use "at" as in d.at[str(i), 'AR'] or some such.

Add a sample data frame and refine your question for more suggestions.

  • Related