Home > OS >  How to create regex pattern that removes elements equal to a substring, if and only if it finds an e
How to create regex pattern that removes elements equal to a substring, if and only if it finds an e

Time:08-08

I'm having some trouble creating a regex that if it receives a string with an enum ( element, element, element, element_diferent, ... and element) then just leave the enum element other than hyjk11l

Example 1:

Input string:

"I come with hyjk11l, Mary Johnson, hyjk11l, hyjk11l, hyjk11l and hyjk11l to the center, maybe we'll buy something there"

the output that i need:

"I come with Mary Johnson to the center, maybe we'll buy something there"

Example 2:

Input string:

"In afternoon, I show hyjk11l, John, hyjk11l, and hyjk11l in the lab"

the output that i need:

"In afternoon, I show with John in the lab"

Example 3:

Input string:

"I meet with Katy Perry and hyjk11l here"

the output that i need:

"I meet with Katy Perry here"

I have tried using the replace() function, and some regex combinations but I don't get the desired result. I think maybe I could remove with replace() and all the ", hyjk11l", "hyjk11l", ", and hyjk11l" and/or "and hyjk11l", but I think that's complicated because I don't know how many times I have to do it (this seeks to be general, that is, you do not know what input string will be passed to you, for that the regex would be).

CodePudding user response:

Here is what you can do:

inputs = ["I come with hyjk11l, Mary Johnson, hyjk11l, hyjk11l, hyjk11l and hyjk11l to the center, maybe we'll buy something there",
          "In afternoon, I show hyjk11l, John, hyjk11l, and hyjk11l in the lab",
          "I meet with Katy Perry and hyjk11l here",
          "I meet him and hyjk11l and her there"
         ]

pat = r"((?:[\s \,]|\s?and)\s?hyjk11l(?:[\s\,]?)(?=\s))"
for inp in inputs:
    tmp = re.sub(pat, "", inp)
    print(tmp)

Output:

I come with Mary Johnson to the center, maybe we'll buy something there
In afternoon, I show John in the lab
I meet with Katy Perry here
I meet him and her there

Check the regex at Regex101.

Explanation pattern:

  • (?:[\s \,]|\s?and) : non-capturing group, match one or more whitespaces or comma OR 0 or 1 whitespace and and
  • \s? : 0 or 1 whitespace
  • hyjk11l : match this word
  • (?:[\s\,]?) : non-capturing group, match 0 or 1 whitespace or comma
  • (?=\s) : match followed by a whitespace
  • whole pattern builds one group, which will be replaced with ""
  • Related