I am new to regexes.
I have the following string : \n(941)\n364\nShackle\n(941)\nRivet\n105\nTop
Out of this string, I want to extract Rivet
and I already have (941)
as a string in a variable.
My thought process was like this:
- Find all the
(941)
s - filter the results by checking if the string after
(941)
is followed by \n, followed by a word, and ending with \n - I made a regex for the 2nd part:
\n[\w\s\'\d\-\/\.] $\n
.
The problem I am facing is that because of the parenthesis in (941)
the regex is taking 941 as a group. In the 3rd step the regex may be wrong, which I can fix later, but 1st I needed help in finding the 2nd (941)
so then I can apply the 3rd step on that.
PS.
- I know I can use python string methods like find and then loop over the searches, but I wanted to see if this can be done directly using regex only.
- I have tried the following regex:
(?:...)
,(941){1}
and the make regex literal character\
like this\(941\)
with no useful results. Maybe I am using them wrong.
Just wanted to know if it is possible to be done using regex. Though it might be useful for others too or a good share for future viewers.
Thanks!
CodePudding user response:
Assuming:
- You want to avoid matching only digits;
- Want to match a substring made of word-characters (thus including possible digits);
Try to escape the variable and use it in the regular expression through f-string:
import re
s = '\n(941)\n364\nShackle\n(941)\nRivet\n105\nTop'
var1 = '(941)'
var2 = re.escape(var1)
m = re.findall(fr'{var2}\n(?!\d \n)(\w )', s)[0]
print(m)
Prints:
Rivet
CodePudding user response:
If you have text in a variable that should be matched exactly, use re.escape()
to escape it when substituting into the regexp.
s = '\n(941)\n364\nShackle\n(941)\nRivet\n105\nTop'
num = '(941)'
re.findall(rf'(?<=\n{re.escape(num)}\n)[\w\s\'\d\-\/\.] (?=\n)', s)
This puts (941)\n
in a lookbehind, so it's not included in the match. This avoids a problem with the \n
at the end of one match overlapping with the \n
at the beginning of the next.