Home > database >  Exact search of a string that has parenthesis using regex
Exact search of a string that has parenthesis using regex

Time:07-29

I am new to regexes.

I have the following string : \n(941)\n364\nShackle\n(941)\nRivet\n105\nTop

Out of this string, I want to extract Rivet and I already have (941) as a string in a variable.

My thought process was like this:

  1. Find all the (941)s
  2. filter the results by checking if the string after (941) is followed by \n, followed by a word, and ending with \n
  3. I made a regex for the 2nd part: \n[\w\s\'\d\-\/\.] $\n.

The problem I am facing is that because of the parenthesis in (941) the regex is taking 941 as a group. In the 3rd step the regex may be wrong, which I can fix later, but 1st I needed help in finding the 2nd (941) so then I can apply the 3rd step on that.

PS.

  1. I know I can use python string methods like find and then loop over the searches, but I wanted to see if this can be done directly using regex only.
  2. I have tried the following regex: (?:...), (941){1} and the make regex literal character \ like this \(941\) with no useful results. Maybe I am using them wrong.

Just wanted to know if it is possible to be done using regex. Though it might be useful for others too or a good share for future viewers.

Thanks!

CodePudding user response:

Assuming:

  • You want to avoid matching only digits;
  • Want to match a substring made of word-characters (thus including possible digits);

Try to escape the variable and use it in the regular expression through f-string:

import re
s = '\n(941)\n364\nShackle\n(941)\nRivet\n105\nTop'
var1 = '(941)'
var2 = re.escape(var1)
m = re.findall(fr'{var2}\n(?!\d \n)(\w )', s)[0]
print(m)

Prints:

Rivet

CodePudding user response:

If you have text in a variable that should be matched exactly, use re.escape() to escape it when substituting into the regexp.

s = '\n(941)\n364\nShackle\n(941)\nRivet\n105\nTop'
num = '(941)'
re.findall(rf'(?<=\n{re.escape(num)}\n)[\w\s\'\d\-\/\.] (?=\n)', s)

This puts (941)\n in a lookbehind, so it's not included in the match. This avoids a problem with the \n at the end of one match overlapping with the \n at the beginning of the next.

  • Related