Home > Net >  How to read multiline json-like file with multiple JSON fragments separated by just a new line?
How to read multiline json-like file with multiple JSON fragments separated by just a new line?

Time:12-01

I have a json file with multiple json objects (each object can be a multiple line json) Example:

{"date": "2022-11-29", "runs": [{"23597": 821260}, {"23617": 821699}]}
{"date": "2022-11-30", "runs": [{"23597": 821269}, {"23617": 8213534}]}

Note that indeed this is not valid JSON as whole file (and hence regular "read JSON in Python" code fails, expected), but each individual "fragment" is complete and valid JSON. It sounds like file was produced by some logging tool that simply appends the next block as text to the file.

As expected, regular way of reading that I have tried with the below snippet fails:

with open('run_log.json','r') as file:
    d = json.load(file)
    print(d)

Produces expected error about invalid JSON:

JSONDecodeError: Extra data: line 3 column 1 (char 89)

How can I solve this, possibly using the json module? Ideally, I want to read the json file and get the runs list for only a particular date (Ex : 2022-11-30), but just being able to read all entries would be enough.

CodePudding user response:

NDJSON, not JSON.

It's a valid file format and often confused for JSON.

Python of course has a library for this.

import ndjson

with open('run_log.json','r') as file:
    d = ndjson.load(file)
    for elem in d:
        print(type(elem), elem)

output

<class 'dict'> {'date': '2022-11-29', 'runs': [{'23597': 821260}, {'23617': 821699}]}
<class 'dict'> {'date': '2022-11-30', 'runs': [{'23597': 821269}, {'23617': 8213534}]}

CodePudding user response:

Each line is valid JSON (See JSON Lines format) and it makes a nice format as a logger since a file can append new JSON lines without read/modify/write of the whole file as JSON would require.

You can use json.loads() to parse it a line at a time.

Given run_log.json:

{"date": "2022-11-29", "runs": [{"23597": 821260}, {"23617": 821699}]}
{"date": "2022-11-30", "runs": [{"23597": 821269}, {"23617": 8213534}]}

Use:

import json

with open('run_log.json', encoding='utf8') as file:
    for line in file:
        data = json.loads(line)
        print(data)

Output:

{'date': '2022-11-29', 'runs': [{'23597': 821260}, {'23617': 821699}]}
{'date': '2022-11-30', 'runs': [{'23597': 821269}, {'23617': 8213534}]}
  • Related