I have a function that reads an excel file into a dataframe and then I save that dataframe in an s3 bucket using the awswrangler api to_csv function. The excel file has data starting from different rows and columns.
My conversion code looks something like this:
df = wr.s3.read_excel(
path = 's3://bucket/filepath/',
sheet_name = 'sheetname',
header = 5,
index_col = False).iloc[:, 3:]
wr.s3.to_csv(df, path = "s3://bucket/filepath/',
dataset = True,
filename_prefix = 'sheetname')
The code works properly but I am having 2 issues:
- The csv outputs an additional index column even though I have specified
index_col = False
- The file csv output name comes with additional serial numbers which I dont want. I just want something like
sheetname.csv
. I looked at the documentation and didn't find any good leads on how I can do that.
The current output looks something like sheetname8f1e8wefdf15f4wgfe5fef46we54f.csv
CodePudding user response:
You can specify the full path in the path
parameter. You don't need to use filename_prefix
at all. Using filename_prefix
is why you are getting those extra characters in the filename.
Also, you can disable the index in the output by setting index = False
.
wr.s3.to_csv(df, path = "s3://bucket/filepath/sheetname.csv",
index = False)