Home > front end >  How to read a csv file from S3 using dask and mentioning my access key and secret?
How to read a csv file from S3 using dask and mentioning my access key and secret?

Time:09-27

I am trying to read a CSV file from S3 using dask but I am getting the following error. Can anyone please correct me If I'm doing anything wrong here?

aws_access_key_id = 'xxxx'
aws_secret_access_key = 'xxxx'
df = dd.read_csv('s3://{bucket}/{file_key.csv}', storage_options = {'key': aws_access_key_id, 'secret': aws_secret_access_key})

Error I am facing:

TypeError: sequence item 0: expected str instance, tuple found

CodePudding user response:

You can use boto3 to create S3 connection with access key ID and secret access key.

    import boto3
     import io
    import dask as dd
    s3_client = boto3.client('s3')

    response = s3_client.get_object(Bucket,s3_key)
    file = response["Body"].read()
    df = dd.read_csv(io.BytesIO(file))

Note:Export keys in environment using os.env.

CodePudding user response:

I used boto3 and this is similar to read a csv from S3 using pandas. This worked out for me!

import boto3
import dask
import dask.dataframe as dd
df = dd.read_csv('s3://*****.csv', storage_options = {'key': 'XXXX', 'secret': 'XXXX'}, assume_missing=True)
  • Related