I'm trying to extract result from a shell command. The reason i need to use shell inside python is: because i need to use Golang binary.
cmd = f'echo {domain} | /root/go/bin/crawler -subd
The go-binary crawler
should output "a string that contains a json". First, i need to extract json from that string by using regex.
import regex
regu = regex.compile(r'\{(?:[^{}]|(?R))*\}')
cmd = regu.findall(cmd)
The main goal is: to extract a json value from findall
result.
cmd = cmd['status']['http']
for i in cmd:
if i['codes']=='200':
stuff
else:
stuff
Above fails because findall
returns a list
and not a dict
. As another attempt, i'm trying to dump the result using json
pkg.
import json
dummy = json.dumps(cmd)
cmd = dummy['status']['http']
But using json.dumps()
returns unnecessary \
infront of each string.
{\'status':{\'http':{\'codes': \'200'}}}
This means that i need to use another regex
or others; to remove \
. Meanwhile when using findall it returns:
['{'status':{'http':{'codes':'200'}}}']
How can i turns findall result into pure dict
so it can extractable when using dummy['status']['http']
?
UPDATE 1:
Another attempt is using groupdict
and finditer
:
regu = regex.compile(r'\{(?:[^{}]|(?R))*\}')
cmd = regu.finditer(cmd)
cmd = cmd.groupdict()["statuses"]["http"]
it casting yet another problem.
AttributeError: '_regex.Scanner' object has no attribute 'groupdict'
UPDATE 2:
Someone might curious about the crawler
output:
b'time="2022-08-04" msg="starte"\ntime="2022-08-04" level=dbg msg="finished"\n{"status":{"http":{"codes":200}}}\n'
I had to use regex
to remove all the unnecessary comment.
CodePudding user response:
Assuming the json is in the last line and your output is named out
:
import json
cmd = json.loads(out.decode('utf-8').strip().rsplit('\n', 1)[-1])
print(cmd)
output:
{'status': {'http': {'codes': 200}}}
CodePudding user response:
You only get a dictionary if you have named capture groups in the regexp; those captures will be put into the groupdict
attribute of the Match
object, which you can get using re.search()
or re.finditer()
. But you don't need a dictionary. Just get the single match of the regexp, and call json.loads()
to parse it as JSON.
import regex
regu = regex.compile(r'\{(?:[^{}]|(?R))*\}')
cmd = json.loads(regu.search(cmd).group())