Home > Mobile >  Why can't I regex a converted bytes to string?
Why can't I regex a converted bytes to string?

Time:07-06

My issue is that when I regex a string variable it works, but when I convert a byte object to string and then regex that string it returns an empty list, here is my code.

simple.cpp:

#include <iostream>
using namespace std;

string a = "ABC";

int main() {
  cout << "Hello World!";
  return 0;
} 

program.py

import subprocess as subs
import re


file = "simple.cpp"

full_ast = subs.run(["clang -Xclang -ast-dump %s" % file], shell=True, stdout=subs.PIPE)


s = ("test | |-UsingDirectiveDecl 0x16de688 <line:58:3, col:24> col:24 Namespace 0x16de588 '__debug' test\n"
 "test |-UsingDirectiveDecl 0x1e840b8 <simple.cpp:2:1, col:17> col:17 Namespace 0x1378e98 'std' test")


pattern = r"UsingDirectiveDecl\s0x[a-f0-9]{7}\s <simple\.cpp:[0-9] :[0-9] ,\s[a-zA-Z] :[0-9] >\s[a-zA-Z] :[0-9] \sNamespace\s0x[a-f0-9]{7}\s'[^']*'"

s_full_ast = str(full_ast.stdout)
namespace_s = re.findall(pattern, s) # Switch between s and s_full_ast
print(namespace_s)

I want to know why it's not working and how I can fix it. Any help is much appreciated.

CodePudding user response:

You aren't creating the string you think you are:

>>> str(b'foo')
"b'foo'"  # not 'foo'

You want to decode the bytes value instead.

>>> b'foo'.decode()
'foo'

subprocess can do this for you, if you supply a text keyword argument.

>>> subprocess.run("echo foo", shell=True, stdout=subprocess.PIPE).stdout
b'foo\n'
>>> subprocess.run("echo foo", shell=True, stdout=subprocess.PIPE, text=True).stdout
'foo\n'

(You may also need to supply the encoding argument to specify what encoding should be used to convert the bytes written by your command to a str.)

  • Related