Home > Software design >  Parsing Json in Python using Rich - Why Does This Happen?
Parsing Json in Python using Rich - Why Does This Happen?

Time:12-31

I am using the rich library to parse json data retrieved with aiohttp. It works great printing the data directly from the API, formatting nicely (with line breaks so that it is not hard to read):

{
    'city': 'Haidian',
    'region_code': 'BJ',
    'os': None,
    'tags': [],
    'ip': 1699530633,
    'isp': 'China Education and Research Network Center',
    'area_code': None,
    'longitude': 116.28868,
    'last_update': '2021-12-16T05:42:00.377583',
    'ports': [8888],
    'latitude': 39.99064,
    'hostnames': [],
    'postal_code': None,
    'country_code': 'CN',
    'country_name': 'China',
    'domains': [],
    'org': 'China Education and Research Network',
    'data': [
        {
            '_shodan': {'options': {}, 'id': '1d25e274-18ce-4a3d-8e1c-73e5bf35bf76', 'module': 'http-simple-new', 'crawler': '42f86247b760542c0192b61c60405edc5db01d55'},
            'hash': -1008250258,
            'os': None,
            'opts': {},
            'timestamp': '2021-12-16T05:42:00.377583',
            'isp': 'China Education and Research Network Center',
            'port': 8888,
            'hostnames': [],
            'location': {'city': 'Haidian', 'region_code': 'BJ', 'area_code': None, 'longitude': 116.28868, 'country_name': 'China', 'postal_code': None, 'country_code': 'CN', 'latitude': 39.99064},
            'ip': 1699530633,
            'domains': [],
            'org': 'China Education and Research Network',
            'data': 'GET / HTTP/1.1\r\nHost: 101.76.199.137\r\n\r\n',
            'asn': 'AS4538',
            'transport': 'tcp',
            'ip_str': '101.x.199.x'
        }
    ],
    'asn': 'AS4538',
    'ip_str': '101.x.199.x'
}

The program then appends that to a dictionary like:

ipInfo = {}
async def host(ip):
    ret = await fetch(ip) 
    ipInfo[ip] = ret

Then after its is finished with a list of ip addresses it writes this dictionary to a file. The issue I am having is that when I load this data to review at a later time and attempt to parse it, the rich library does not format it nicely the way that it does when it is just coming from the API. It always ends up looking like:

[{'hash': -644847518, 'timestamp': '2021-12-27T15:08:16.109960', 'isp': 'VNPT Corp', 'transport': 'tcp', 'data': 'GET / HTTP/1.1\r\nHost: 113.x.185.x\r\n\r\n', 'asn': 'AS45899', 'port': 5555, 'hostnames': ['static.vnpt.vn'], 
'location': {'city': 'Vị Thanh', 'region_code': '73', 'area_code': None, 'longitude': 105.47012, 'latitude': 9.78449, 'postal_code': None, 'country_code': 'VN', 'country_name': 'Viet Nam'}, 'ip': 1906751888, 'domains': ['vnpt.vn'], 
'org': 'Vietnam Posts and Telecommunications Group', 'os': None, '_shodan': {'crawler': 'd905ab419aeb10e9c57a336c7e1aa9629ae4a733', 'options': {}, 'id': '33f5bd73-c7d7-4dc0-beb8-b17afb53d931', 'module': 'http-simple-new', 'ptr': 
True}, 'opts': {}, 'ip_str': '113.x.185.x'}], 'asn': 'AS45899', 'city': 'Vị Thanh', 'latitude': 9.78449, 'isp': 'VNPT Corp', 'longitude': 105.47012, 'last_update': '2021-12-27T15:08:16.109960', 'country_name': 'Viet Nam', 
'ip_str': '113.x.185.x', 'os': None, 'ports': [5555]}

And that does not work for me because I need to be able to actually read it. The code I am currently using to parse it looks like:

if argsc.parse:
    _print(f'Opening {argsc.parse}')
    with open(argsc.parse, 'r') as f:
        f = f.read()
        rich.print(f)
        exit(0)

I have tried using rich.print_json and parsing the dictionary entries one at a time, all sorts of things really. I did notice while writing this post that if the data is saved like it is in the first example with the nice newlines formatting then it does parse correctly, but I don't know how to do that either.

So my question is (guess it is two questions): 1) How do I save the data from rich so that it is saved the way that I see it on the screen? And: 2) How I do parse json data in a file with the nice newline formatting seen in the first example? Is that even possible? Maybe that is the way it comes back the API and it is being written differently. But I tried writing the data as-is without appending it to a dictionary and that did not work either.

CodePudding user response:

I figured it out. The answer was to save the output as if I were simply redirecting standard output to a file, as detailed in this blog. The function I ended up using is:

from rich import print as rprint

def write_formatted(data):
    """
    Func to write PARSABLE output 
    :param data:
    :return: nada, just print
    """
    with open(f'{argsc.output}-parsable.txt', 'a') as f:
        rprint(data, file=f)
        f.close()

And a function to later open that data and parse it with rich:

def parse_data():
    _print(f'Opening {argsc.parse}')
    with open(argsc.parse, 'r') as f:
        f = f.read()
        rprint(f)
        exit(0)

CodePudding user response:

When you read from a file you will get back a string. Rich won't do any formatting of that string, because it does not know that the string contains JSON.

You could decode that string in to a Python object by using the builtin json module. Add import json to the top of your file, and then my_data=json.loads(f.read()) which will give you a dict or list you can then print.

Alternatively, with Rich you can use the print_json method which parse and pretty print a string containing JSON in a single step.

Put this at the start of your code:

from rich import print_json

Then add the following to your parse_data method:

print_json(f.read())
  • Related