My issue is that I am trying to write a regex key to find certain line(s). Here is the data
| |-UsingDirectiveDecl 0x16de688 <line:58:3, col:24> col:24 Namespace 0x16de588 '__debug'
|-UsingDirectiveDecl 0x1e840b8 <simple.cpp:2:1, col:17> col:17 Namespace 0x1378e98 'std'
and here is the regex code
import subprocess as subs
import os
import re
file = "simple.cpp"
full_ast = subs.run(["clang -Xclang -ast-dump %s" % file], shell=True, stdout=subs.PIPE) # , (" > %s.xml" % title)
namespace_s = re.search("UsingDirectiveDecl\s0x([ -]?(?=\.\d|\d)(?:\d )?(?:\.?\d*))(?:[eE]([ -]?\d ))?\s<simple\.cpp:[0-9] :[0-9] ,\s[a-zA-Z] ][0-9] >\s[a-zA-Z] :[0-9] \sNamespace\s0x([ -]?(?=\.\d|\d)(?:\d )?(?:\.?\d*))(?:[eE]([ -]?\d ))?\s'[^']*'", str(full_ast.stdout))
# not sure if re.search is the right module.
print(namespace_s)
I'm tring to match the bottom line, only I have had any success. Two thing I would like to happen, 1: where there is a offset like 0x1e840b8 I need it to match as 0x7hexcharacters - originally I tried 0x[a-z0-9]{7} but that didn't work. 2: How can I put the file name in, would it work with %s then joining the key with % file
Any help is much appreciated
CodePudding user response:
Regarding the regex, you are trying to match (?:\d )?(?:\.?\d*))(?:[eE]([ -]?\d ))?
on the places with the hex part, but you can use 0x[a-f0-9]{7}
instead.
If you are matching, you don't need the lookahead (?=\.\d|\d)
There is also an extra closing bracket ]
that is not in the example data, that should be a :
<simple\.cpp:[0-9] :[0-9] ,\s[a-zA-Z] ]
^
See for example this pattern:
UsingDirectiveDecl\s0x[a-f0-9]{7}\s <simple\.cpp:[0-9] :[0-9] ,\s[a-zA-Z] :[0-9] >\s[a-zA-Z] :[0-9] \sNamespace\s0x[a-f0-9]{7}\s'[^']*'
Example
import re
pattern = r"UsingDirectiveDecl\s0x[a-f0-9]{7}\s <simple\.cpp:[0-9] :[0-9] ,\s[a-zA-Z] :[0-9] >\s[a-zA-Z] :[0-9] \sNamespace\s0x[a-f0-9]{7}\s'[^']*'"
s = ("test | |-UsingDirectiveDecl 0x16de688 <line:58:3, col:24> col:24 Namespace 0x16de588 '__debug' test\n"
"test |-UsingDirectiveDecl 0x1e840b8 <simple.cpp:2:1, col:17> col:17 Namespace 0x1378e98 'std' test")
print(re.findall(pattern, s))
Output
["UsingDirectiveDecl 0x1e840b8 <simple.cpp:2:1, col:17> col:17 Namespace 0x1378e98 'std'"]