Need some help.
I have a JSON file that is a result of a Auth0 export data dump. Each of the lines are not separated by commas.
Below is the file called OUTPUT_USER_DUMP.json
{"user_id": "auth0|5f9886ee8e36ac0069e8fc3a","name": "John Smith","email": "[email protected]"}
{"user_id": "auth0|5fa43f699e937f0068c40d8e","name": "Bob Anderson","email": "[email protected]"}
What I wish to do is open this json dump file using a python script and assign the contents into a list variable (example below when the list variable is printed out)
[{"user_id": "auth0|5f9886ee8e36ac0069e8fc3a","name": "John Smith","email": "[email protected]"},
{"user_id": "auth0|5fa43f699e937f0068c40d8e","name": "Bob Anderson","email": "[email protected]"}]
Any help?
CodePudding user response:
You can read the file line by line and load each line as a json data:
from json import loads
with open("OUTPUT_USER_DUMP.json", "r") as f2r:
data = [loads(each_line) for each_line in f2r]
print(data)
CodePudding user response:
You can read a new line delimitered JSON file with pandas directly. You can also convert it to the format you have requested using the to_dict function on a dataframe
Code
df = pd.read_json('./OUTPUT_USER_DUMP.json', lines=True)
print(df.to_dict('records'))
Output
[
{'user_id': 'auth0|5f9886ee8e36ac0069e8fc3a', 'name': 'John Smith', 'email': '[email protected]'},
{'user_id': 'auth0|5fa43f699e937f0068c40d8e', 'name': 'Bob Anderson', 'email': '[email protected]'}
]
CodePudding user response:
Given:
bad_json='''
{
"user_id": "auth0|5f9886ee8e36ac0069e8fc3a",
"name": "John Smith",
"email": "[email protected]"
}
{
"user_id": "auth0|5fa43f699e937f0068c40d8e",
"name": "Bob Anderson",
"email": "[email protected]"
}'''
You can use a regex:
import re
import json
t=re.sub(r"\}\s*\{", "},\n{", bad_json)
new_json=rf'[{t}]'
>>> json.loads(new_json)
[{'user_id': 'auth0|5f9886ee8e36ac0069e8fc3a', 'name': 'John Smith', 'email': '[email protected]'}, {'user_id': 'auth0|5fa43f699e937f0068c40d8e', 'name': 'Bob Anderson', 'email': '[email protected]'}]
EDIT
It appears that you file is LINES of individual JSON.
Given:
cat file
{"user_id": "auth0|5f9886ee8e36ac0069e8fc3a","name": "John Smith","email": "[email protected]"}
{"user_id": "auth0|5fa43f699e937f0068c40d8e","name": "Bob Anderson","email": "[email protected]"}
You can just iterate over the file line-by-line and decode as you go:
import json
with open('/tmp/file') as f:
data=[json.loads(line) for line in f]
>>> data
[{'user_id': 'auth0|5f9886ee8e36ac0069e8fc3a', 'name': 'John Smith', 'email': '[email protected]'}, {'user_id': 'auth0|5fa43f699e937f0068c40d8e', 'name': 'Bob Anderson', 'email': '[email protected]'}]
CodePudding user response:
Since file objects are iterable, yielding their lines in python, you can write a function to process each line as a JSON object:
from json import dumps, loads
from typing import Iterable, Iterator
FILENAME = 'OUTPUT_USER_DUMP.json'
def read_json_objects(file: Iterable[str]) -> Iterator[dict]:
"""Yield JSON objects from each line of a given file."""
for line in file:
if line := line.strip():
yield loads(line)
def main():
"""Run the script."""
with open(FILENAME, 'r', encoding='utf-8') as file:
json = list(read_json_objects(file))
print(dumps(json, indent=2))
if __name__ == '__main__':
main()