Importing a file with individual JSON lines to a list-CodePudding

Need some help.

I have a JSON file that is a result of a Auth0 export data dump. Each of the lines are not separated by commas.

Below is the file called OUTPUT_USER_DUMP.json

{"user_id": "auth0|5f9886ee8e36ac0069e8fc3a","name": "John Smith","email": "[email protected]"}
{"user_id": "auth0|5fa43f699e937f0068c40d8e","name": "Bob Anderson","email": "[email protected]"}

What I wish to do is open this json dump file using a python script and assign the contents into a list variable (example below when the list variable is printed out)

[{"user_id": "auth0|5f9886ee8e36ac0069e8fc3a","name": "John Smith","email": "[email protected]"},
{"user_id": "auth0|5fa43f699e937f0068c40d8e","name": "Bob Anderson","email": "[email protected]"}]

Any help?

CodePudding user response：

You can read the file line by line and load each line as a json data:

from json import loads

with open("OUTPUT_USER_DUMP.json", "r") as f2r:
    data = [loads(each_line) for each_line in f2r]
    print(data)

CodePudding user response：

You can read a new line delimitered JSON file with pandas directly. You can also convert it to the format you have requested using the to_dict function on a dataframe

Code

df = pd.read_json('./OUTPUT_USER_DUMP.json', lines=True)
print(df.to_dict('records'))

Output

[
  {'user_id': 'auth0|5f9886ee8e36ac0069e8fc3a', 'name': 'John Smith', 'email': '[email protected]'}, 
  {'user_id': 'auth0|5fa43f699e937f0068c40d8e', 'name': 'Bob Anderson', 'email': '[email protected]'}
]

CodePudding user response：

Given:

bad_json='''
{
    "user_id": "auth0|5f9886ee8e36ac0069e8fc3a",
    "name": "John Smith",
    "email": "[email protected]"
}
{
    "user_id": "auth0|5fa43f699e937f0068c40d8e",
    "name": "Bob Anderson",
    "email": "[email protected]"
}'''

You can use a regex:

import re 
import json 

t=re.sub(r"\}\s*\{", "},\n{", bad_json)
new_json=rf'[{t}]'

>>> json.loads(new_json)
[{'user_id': 'auth0|5f9886ee8e36ac0069e8fc3a', 'name': 'John Smith', 'email': '[email protected]'}, {'user_id': 'auth0|5fa43f699e937f0068c40d8e', 'name': 'Bob Anderson', 'email': '[email protected]'}]

EDIT

It appears that you file is LINES of individual JSON.

Given:

cat file
{"user_id": "auth0|5f9886ee8e36ac0069e8fc3a","name": "John Smith","email": "[email protected]"}
{"user_id": "auth0|5fa43f699e937f0068c40d8e","name": "Bob Anderson","email": "[email protected]"}

You can just iterate over the file line-by-line and decode as you go:

import json 

with open('/tmp/file') as f:
    data=[json.loads(line) for line in f]

>>> data
[{'user_id': 'auth0|5f9886ee8e36ac0069e8fc3a', 'name': 'John Smith', 'email': '[email protected]'}, {'user_id': 'auth0|5fa43f699e937f0068c40d8e', 'name': 'Bob Anderson', 'email': '[email protected]'}]

CodePudding user response：

Since file objects are iterable, yielding their lines in python, you can write a function to process each line as a JSON object:

from json import dumps, loads
from typing import Iterable, Iterator


FILENAME = 'OUTPUT_USER_DUMP.json'


def read_json_objects(file: Iterable[str]) -> Iterator[dict]:
    """Yield JSON objects from each line of a given file."""

    for line in file:
        if line := line.strip():
            yield loads(line)


def main():
    """Run the script."""

    with open(FILENAME, 'r', encoding='utf-8') as file:
        json = list(read_json_objects(file))

    print(dumps(json, indent=2))


if __name__ == '__main__':
    main()