Home > front end >  How to find multiple keywords in a string column in python
How to find multiple keywords in a string column in python

Time:02-05

I have a column

|ABC|
-----
|JWUFT_P_RECF_1_DK1_VWAP_DFGDG_P_REGB_1_PK1_XYZ|

i WANT TO CHECK IF THERE ARE WORDS "DK" AND 'PK' in the row or not. i need to perform this with different words in entire column.

match = ['DK', 'PK']

i used df.ABC.str.split('_').isin(match), but it splits into list but getting error

SystemError: <built-in method view of numpy.ndarray object at 0x0000021171056DB0> returned a result with an error set

What is the best way to get the expected output, which is a bool True|False

Thanks.

CodePudding user response:

Maybe either of the two following options:

(?:[A-Z\d] _)*?([DP]K)\d*_(?:[A-Z\d] _)*?(?!\1)([DP]K)\d*(?:_[A-Z\d] )*?

See an online [demo](https://regex101.com/r/KyqtsT/10


import pandas as pd
df = pd.DataFrame(data={'ABC': ['JWUFT_P_RECF_1_DK1_VWAP_DFGDG_P_REGB_1_PK1_XYZ']})
df['REX_TEST'] = df.REX_TEST.str.match(r'(?:[A-Z\d] _)*?([DP]K)\d*_(?:[A-Z\d] _)*?(?!\1)([DP]K)\d*(?:_[A-Z\d] )*?')
print(df)

Or, add leading/trailing underscores to your data before matching:

import pandas as pd
df = pd.DataFrame(data={'ABC': ['JWUFT_P_RECF_1_DK1_VWAP_DFGDG_P_REGB_1_PK1_XYZ']})
df['REX_TEST']= '_'   df.ABC   '_'
df['REX_TEST'] = df.REX_TEST.str.match(r'(?=.*_PK\d*_)(?=.*_DK\d*_).*')
print(df)

Both options print:

                                              ABC  REX_TEST
0  JWUFT_P_RECF_1_DK1_VWAP_DFGDG_P_REGB_1_PK1_XYZ      True

Note that I wanted to make sure that both 'DK' nor 'PK' are a substring of a larger word.

CodePudding user response:

You can use python re library to search a string:

import re
s = "JWUFT_P_RECF_1_DK1_VWAP_DFGDG_P_REGB_1_PK1_XYZ"
r = re.search(r"(DK).*(PK)|(PK).*(DK)",s) # the pipe is used like "or" keyword

If what your parameters are matched with the string it will evaluate to True:

if r:
    print("hello world!")
  •  Tags:  
  • Related