I have a column
|ABC|
-----
|JWUFT_P_RECF_1_DK1_VWAP_DFGDG_P_REGB_1_PK1_XYZ|
i WANT TO CHECK IF THERE ARE WORDS "DK" AND 'PK' in the row or not. i need to perform this with different words in entire column.
match = ['DK', 'PK']
i used df.ABC.str.split('_').isin(match)
, but it splits into list but getting error
SystemError: <built-in method view of numpy.ndarray object at 0x0000021171056DB0> returned a result with an error set
What is the best way to get the expected output, which is a bool True|False
Thanks.
CodePudding user response:
Maybe either of the two following options:
(?:[A-Z\d] _)*?([DP]K)\d*_(?:[A-Z\d] _)*?(?!\1)([DP]K)\d*(?:_[A-Z\d] )*?
See an online [demo](https://regex101.com/r/KyqtsT/10
import pandas as pd
df = pd.DataFrame(data={'ABC': ['JWUFT_P_RECF_1_DK1_VWAP_DFGDG_P_REGB_1_PK1_XYZ']})
df['REX_TEST'] = df.REX_TEST.str.match(r'(?:[A-Z\d] _)*?([DP]K)\d*_(?:[A-Z\d] _)*?(?!\1)([DP]K)\d*(?:_[A-Z\d] )*?')
print(df)
Or, add leading/trailing underscores to your data before matching:
import pandas as pd
df = pd.DataFrame(data={'ABC': ['JWUFT_P_RECF_1_DK1_VWAP_DFGDG_P_REGB_1_PK1_XYZ']})
df['REX_TEST']= '_' df.ABC '_'
df['REX_TEST'] = df.REX_TEST.str.match(r'(?=.*_PK\d*_)(?=.*_DK\d*_).*')
print(df)
Both options print:
ABC REX_TEST
0 JWUFT_P_RECF_1_DK1_VWAP_DFGDG_P_REGB_1_PK1_XYZ True
Note that I wanted to make sure that both 'DK' nor 'PK' are a substring of a larger word.
CodePudding user response:
You can use python re library to search a string:
import re
s = "JWUFT_P_RECF_1_DK1_VWAP_DFGDG_P_REGB_1_PK1_XYZ"
r = re.search(r"(DK).*(PK)|(PK).*(DK)",s) # the pipe is used like "or" keyword
If what your parameters are matched with the string it will evaluate to True:
if r:
print("hello world!")