Home > Back-end >  Python, AWS S3: how to read file with jsons
Python, AWS S3: how to read file with jsons

Time:10-08

I have an JSONs files in S3 bucket(file with each row - json). And I'm having troubles correctly read them.. What I'm doing:

s3 = boto3.client('s3')
response = s3.get_object(Bucket=SOURCE_BUCKET, Key=key)
file = response['Body']
for line in file:
    data_json = json.loads(line, encoding='utf-8')

In this case it ignores \n and read bunch of text as a line.

How to properly read all jsons from each line in a file?

Example of an input file content (a file with number of jsons as a separate row):

{"notificationItems":[{"NotificationRequestItem":{"eventCode":"PENDING","AccountCode":"A001US","amount":{"currency":"USD","value":111},"success":"true","method":"xxx","reference":"43535353","date":"2021"}}],"go":"true"}
{"notificationItems":[{"NotificationRequestItem":{"eventCode":"PENDING","AccountCode":"A002US","amount":{"currency":"USD","value":111},"success":"true","method":"xxx","reference":"43535353","date":"2021"}}],"go":"true"}
...
{"notificationItems":[{"NotificationRequestItem":{"eventCode":"PENDING","AccountCode":"A003US","amount":{"currency":"USD","value":111},"success":"true","method":"xxx","reference":"43535353","date":"2021"}}],"go":"true"}

CodePudding user response:

boto3's get_object returns a StreamingBody object as the value for Body of the return dictionary.

One of the methods of the object is an iter_lines method that allows you to iterate over the lines of the response as it's read. You can call json.loads on each line from there:

for line in file.iter_lines():
    data = json.loads(line)
    print(data)

CodePudding user response:

Get object returns a aws botocore.response.StreamingBody You need to do a .read() on if your function taking it cannot take a raw byte stream ( see this documentation )

response = s3.get_object(Bucket=SOURCE, Key=key)['body'].read()
for line in response:
     json_data = json.loads(line)
  • Related