Reading contents from gzip file in python dataframe which is available in AWS S3.
Want to convert dataframe.
CodePudding user response:
In case if you are trying to get json data to dataframe Here is the code.
import pandas as pd
import boto3
from io import StringIO
import gzip
resource = boto3.resource('s3',aws_access_key_id = '',
aws_secret_access_key = '')
list_keys= []
lst = []
for key in client.list_objects(Bucket='bucket_name',Prefix = 'Folder name')['Contents']:
list_keys.append(key["Key"])
for key in list_keys:
try:
obj = resource.Object("bucket_name", key)
with gzip.GzipFile(fileobj=obj.get()["Body"]) as gzipfile:
temp_data = pd.read_json(StringIO(gzipfile.read().decode('UTF-8')),lines=True)
lst.append(temp_data)
except Exception as e:
pass
df = pd.concat(lst,ignore_index = True)