Home > Enterprise >  Reading contents from gzip file which was available in AWS S3
Reading contents from gzip file which was available in AWS S3

Time:11-17

Reading contents from gzip file in python dataframe which is available in AWS S3.

Want to convert dataframe.

CodePudding user response:

In case if you are trying to get json data to dataframe Here is the code.

import pandas as pd
import boto3
from io import StringIO
import gzip
resource = boto3.resource('s3',aws_access_key_id = '',
    aws_secret_access_key = '')
    list_keys= []
    lst = []
    for key in client.list_objects(Bucket='bucket_name',Prefix = 'Folder name')['Contents']:
        list_keys.append(key["Key"])
    for key in list_keys:
        try:
            obj = resource.Object("bucket_name", key)
            with gzip.GzipFile(fileobj=obj.get()["Body"]) as gzipfile:
                temp_data = pd.read_json(StringIO(gzipfile.read().decode('UTF-8')),lines=True)
                lst.append(temp_data)
        except Exception as e:
            pass
    df = pd.concat(lst,ignore_index = True)
  • Related