Example first:
import re
details = 'input1 mem001 output1 mem005 data2 mem002 output12 mem006'
input_re = re.compile(r'(?!output[0-9]*) mem([0-9a-f] )')
print(input_re.findall(details))
# Out: ['001', '005', '002', '006']
I am using negative lookahead to extract the hex part of the mem
entries that are not preceded by an output
, however as you can see it fails. The desired output should be: ['001', '002']
.
What am I missing?
CodePudding user response:
You may use this regex in findall
:
\b(?!output\d )\w \s mem([a-zA-F\d] )
RegEx Details:
\b
: Word boundary(?!output\d )
: Negative lookahead to assert that we don't haveoutput
and 1 digits ahead\w
: Match 1 word characters\s
: Match 1 whitespacesmem([a-zA-F\d] )
: Matchmem
followed by 1 of any hex character
Code:
import re
s = 'input1 mem001 output1 mem005 data2 mem002 output12 mem006'
print( re.findall(r'\b(?!output\d )\w \s mem([a-zA-F\d] )', s) )
Output:
['001', '002']
CodePudding user response:
Maybe an easier approach is to split it up in 2 regular expressions ? First filter out anything that starts with output and is followed by mem like so
output[0-9]* mem([0-9a-f] )
If you filter this out it would result in
input1 mem001 data2 mem002
When you have filtered them out just search for mem again
mem([0-9a-f] )
That would result in your desired output
['001', '002']
Maybe not an answer to the original question, but it is a solution to your problem
CodePudding user response:
First of all, let's understand why your original regex doesn't work:
A regex encapsulates two pieces of information: a description of a location within a text, and a description of what to capture from that location. Your original regex tells the regex matcher: "Find a location within the text where the following characters are not 'output' digits but they are ' mem' alphanumetics". Think of the logic of that expression: if the matcher finds a location in the text where the following characters are ' mem' alphanumerics, then, in particular, the following characters are not 'output' digits. Your look ahead does not add anything to the exoression.
What you really need is to tell the matcher: "Find a location in the text where the following characters are ' mem' alphanumerics, and the previous characters are not 'output' digits. So what you really need is a look-behind, not look-ahead.
@ArtyomVancyan proposed a good regex with a look-behind, and it could easily be modified to what you need: instead of a single digit after the 'output', you want potentially more digits, so just put an asterisk (*) after the '\d'.