Home > Back-end >  regex.findall returns a dict instead of list
regex.findall returns a dict instead of list

Time:08-05

I'm trying to extract result from a shell command. The reason i need to use shell inside python is: because i need to use Golang binary.

cmd = f'echo {domain} | /root/go/bin/crawler -subd

The go-binary crawler should output "a string that contains a json". First, i need to extract json from that string by using regex.

import regex
regu = regex.compile(r'\{(?:[^{}]|(?R))*\}')
cmd = regu.findall(cmd)

The main goal is: to extract a json value from findall result.

cmd = cmd['status']['http']
for i in cmd:
   if i['codes']=='200':
      stuff
   else:
      stuff

Above fails because findall returns a list and not a dict. As another attempt, i'm trying to dump the result using json pkg.

import json
dummy = json.dumps(cmd)
cmd = dummy['status']['http']

But using json.dumps() returns unnecessary \ infront of each string.

{\'status':{\'http':{\'codes': \'200'}}}

This means that i need to use another regex or others; to remove \. Meanwhile when using findall it returns:

['{'status':{'http':{'codes':'200'}}}']

How can i turns findall result into pure dict so it can extractable when using dummy['status']['http']?

UPDATE 1: Another attempt is using groupdict and finditer:

regu = regex.compile(r'\{(?:[^{}]|(?R))*\}')
cmd = regu.finditer(cmd)
cmd = cmd.groupdict()["statuses"]["http"]

it casting yet another problem.

AttributeError: '_regex.Scanner' object has no attribute 'groupdict'

UPDATE 2: Someone might curious about the crawler output:

b'time="2022-08-04" msg="starte"\ntime="2022-08-04" level=dbg msg="finished"\n{"status":{"http":{"codes":200}}}\n'

I had to use regex to remove all the unnecessary comment.

CodePudding user response:

Assuming the json is in the last line and your output is named out:

import json
cmd = json.loads(out.decode('utf-8').strip().rsplit('\n', 1)[-1])
print(cmd)

output:

{'status': {'http': {'codes': 200}}}

CodePudding user response:

You only get a dictionary if you have named capture groups in the regexp; those captures will be put into the groupdict attribute of the Match object, which you can get using re.search() or re.finditer(). But you don't need a dictionary. Just get the single match of the regexp, and call json.loads() to parse it as JSON.

import regex
regu = regex.compile(r'\{(?:[^{}]|(?R))*\}')
cmd = json.loads(regu.search(cmd).group())
  • Related