I have an JSONs files in S3 bucket(file with each row - json). And I'm having troubles correctly read them.. What I'm doing:
s3 = boto3.client('s3')
response = s3.get_object(Bucket=SOURCE_BUCKET, Key=key)
file = response['Body']
for line in file:
data_json = json.loads(line, encoding='utf-8')
In this case it ignores \n
and read bunch of text as a line.
How to properly read all jsons from each line in a file?
Example of an input file content (a file with number of jsons as a separate row):
{"notificationItems":[{"NotificationRequestItem":{"eventCode":"PENDING","AccountCode":"A001US","amount":{"currency":"USD","value":111},"success":"true","method":"xxx","reference":"43535353","date":"2021"}}],"go":"true"}
{"notificationItems":[{"NotificationRequestItem":{"eventCode":"PENDING","AccountCode":"A002US","amount":{"currency":"USD","value":111},"success":"true","method":"xxx","reference":"43535353","date":"2021"}}],"go":"true"}
...
{"notificationItems":[{"NotificationRequestItem":{"eventCode":"PENDING","AccountCode":"A003US","amount":{"currency":"USD","value":111},"success":"true","method":"xxx","reference":"43535353","date":"2021"}}],"go":"true"}
CodePudding user response:
boto3's get_object
returns a StreamingBody object as the value for Body
of the return dictionary.
One of the methods of the object is an iter_lines
method that allows you to iterate over the lines of the response as it's read. You can call json.loads
on each line from there:
for line in file.iter_lines():
data = json.loads(line)
print(data)
CodePudding user response:
Get object returns a aws botocore.response.StreamingBody
You need to do a .read()
on if your function taking it cannot take a raw byte stream ( see this documentation )
response = s3.get_object(Bucket=SOURCE, Key=key)['body'].read()
for line in response:
json_data = json.loads(line)