I am trying to read a CSV file from S3 using dask but I am getting the following error. Can anyone please correct me If I'm doing anything wrong here?
aws_access_key_id = 'xxxx'
aws_secret_access_key = 'xxxx'
df = dd.read_csv('s3://{bucket}/{file_key.csv}', storage_options = {'key': aws_access_key_id, 'secret': aws_secret_access_key})
Error I am facing:
TypeError: sequence item 0: expected str instance, tuple found
CodePudding user response:
You can use boto3 to create S3 connection with access key ID and secret access key.
import boto3
import io
import dask as dd
s3_client = boto3.client('s3')
response = s3_client.get_object(Bucket,s3_key)
file = response["Body"].read()
df = dd.read_csv(io.BytesIO(file))
Note:Export keys in environment using os.env.
CodePudding user response:
I used boto3 and this is similar to read a csv from S3 using pandas. This worked out for me!
import boto3
import dask
import dask.dataframe as dd
df = dd.read_csv('s3://*****.csv', storage_options = {'key': 'XXXX', 'secret': 'XXXX'}, assume_missing=True)