Home > front end >  Regex key to match only a line(s)?
Regex key to match only a line(s)?

Time:07-06

My issue is that I am trying to write a regex key to find certain line(s). Here is the data

| |-UsingDirectiveDecl 0x16de688 <line:58:3, col:24> col:24 Namespace 0x16de588 '__debug'
|-UsingDirectiveDecl 0x1e840b8 <simple.cpp:2:1, col:17> col:17 Namespace 0x1378e98 'std'

and here is the regex code

import subprocess as subs
import os
import re


file = "simple.cpp"

full_ast = subs.run(["clang -Xclang -ast-dump %s" % file], shell=True, stdout=subs.PIPE) # , (" > %s.xml" % title)


namespace_s = re.search("UsingDirectiveDecl\s0x([ -]?(?=\.\d|\d)(?:\d )?(?:\.?\d*))(?:[eE]([ -]?\d ))?\s<simple\.cpp:[0-9] :[0-9] ,\s[a-zA-Z] ][0-9] >\s[a-zA-Z] :[0-9] \sNamespace\s0x([ -]?(?=\.\d|\d)(?:\d )?(?:\.?\d*))(?:[eE]([ -]?\d ))?\s'[^']*'", str(full_ast.stdout))

# not sure if re.search is the right module.

print(namespace_s)

I'm tring to match the bottom line, only I have had any success. Two thing I would like to happen, 1: where there is a offset like 0x1e840b8 I need it to match as 0x7hexcharacters - originally I tried 0x[a-z0-9]{7} but that didn't work. 2: How can I put the file name in, would it work with %s then joining the key with % file

Any help is much appreciated

CodePudding user response:

Regarding the regex, you are trying to match (?:\d )?(?:\.?\d*))(?:[eE]([ -]?\d ))? on the places with the hex part, but you can use 0x[a-f0-9]{7} instead.

If you are matching, you don't need the lookahead (?=\.\d|\d)

There is also an extra closing bracket ] that is not in the example data, that should be a :

<simple\.cpp:[0-9] :[0-9] ,\s[a-zA-Z] ]
                                      ^

See for example this pattern:

UsingDirectiveDecl\s0x[a-f0-9]{7}\s <simple\.cpp:[0-9] :[0-9] ,\s[a-zA-Z] :[0-9] >\s[a-zA-Z] :[0-9] \sNamespace\s0x[a-f0-9]{7}\s'[^']*'

Regex demo | Python demo

Example

import re

pattern = r"UsingDirectiveDecl\s0x[a-f0-9]{7}\s <simple\.cpp:[0-9] :[0-9] ,\s[a-zA-Z] :[0-9] >\s[a-zA-Z] :[0-9] \sNamespace\s0x[a-f0-9]{7}\s'[^']*'"

s = ("test | |-UsingDirectiveDecl 0x16de688 <line:58:3, col:24> col:24 Namespace 0x16de588 '__debug' test\n"
            "test |-UsingDirectiveDecl 0x1e840b8 <simple.cpp:2:1, col:17> col:17 Namespace 0x1378e98 'std' test")

print(re.findall(pattern, s))

Output

["UsingDirectiveDecl 0x1e840b8 <simple.cpp:2:1, col:17> col:17 Namespace 0x1378e98 'std'"]
  • Related