Home > Software design >  how to search a string with spaces within another string in python?
how to search a string with spaces within another string in python?

Time:09-16

I want to search and blank out those sentences which contain words like "masked 111","My Add no" etc. from another sentences like "XYZ masked 111" or "Hello My Add" in python.How can I do that? I was trying to make changes to below codes but it was not working due to spaces.

def garbagefin(x):
k = " ".join(re.findall("[a-zA-Z0-9] ", x))
print(k)
t=re.split(r'\s',k)
print(t)
    
Glist={'masked 111', 'DATA',"My Add no" , 'MASKEDDATA',}
    
for n, m in enumerate(t):  ##to remove entire ID 
    if m in Glist:
        return ''
    else:
        return x

The outputs that I am expecting is:

garbagefin("I am masked 111")-Blank
garbagefin("I am My Add No")-Blank
garbagefin("I am My add")-I am My add
garbagefin("I am My MASKEDDATA")-Blank

CodePudding user response:

Seems like you don't actually need regex. Just the usual in operator.

def garbagefin(x):
    return "" if any(text in x for text in Glist) else x

If your matching is case insensitive, then compare against lowercase text.

Glist = set(map(lambda text: text.casefold(), Glist))
...
def garbagefin(x):
    x_lower = x.casefold()
    return "" if any(text in x_lower for text in Glist) else x

Output

1. 
2. 
3. I am My add
4. 

CodePudding user response:

You can also use a regex approach like this:

import re

Glist={'masked 111', 'DATA',"My Add no" , 'MASKEDDATA',}
glst_rx = r"\b(?:{})\b".format("|".join(Glist))

def garbagefin(x):
    if re.search(glst_rx, x, re.I):
        return ''
    else:
        return x

See the Python demo.

The glst_rx = r"\b(?:{})\b".format("|".join(Glist)) code will generate the \b(?:My Add no|DATA|MASKEDDATA|masked 111)\b regex (see the online demo).

It will match the strings from Glist in a case insensitive way (note the re.I flag in re.search(glst_rx, x, re.I)) as whole words, and once found, an empty string will be returned, else, the input string will be returned.

If there are too many items in Glist, you could leverage a regex trie (see here how to use the trieregex library to generate such tries.)

CodePudding user response:

If you're just trying to find a string from another string, I don't think you even need to use such messed-up code. Plus you can just store the key strings in a array

You can simply use the in method and use return.

def garbagefin (x):
    L=["masked 111","DATA","My Add no", "MASKEDDATA"]
    for i in L:
        if i in x:
            print("Blank")
            return
    print(x)
  • Related