Aim
I have a few different dataframes created with Pandas on Python (in Jupyter Notebook). I want to upload them as separate sheets to an Excel workbook STRAIGHT to Amazon's S3.
Reprex
## Creating two example dataframes
data1 = {'first_column': ['first_value','second_value'],
'second_column': ['first_value', 'second_value']}
df1 = pd.DataFrame(data1)
data2 = {'first_column': ['xvalue', 'yvalue'],
'second_column': ['xavalue', 'yavalue']}
df2 = pd.DataFrame(data2)
## Convert them into Excel Workbook and storing locally
with pd.ExcelWriter('fake_file.xlsx') as writer:
df1.to_excel(writer, sheet_name='df1')
df2.to_excel(writer, sheet_name='df2')
## Uploading the locally stored Excel Workbook onto S3
import boto3
import pathlib
import os
s3 = boto3.client("s3")
bucket_name = "my_bucket_name"
object_name = "final_fake.xlsx"
__file__ = "my_python_script.ipynb"
file_name = os.path.join(pathlib.Path(__file__).parent.resolve(), "fake_file.xlsx")
s3.upload_file(file_name, bucket_name, object_name)
Solution sought
How can I create an excel workbook on S3 using my different dataframes without storing it locally on Jupyter Notebook?
Below, I was able to upload a dataframe straight to S3 as a csv. How can I do the same but send it as a sheet onto an Excel Workbook?
## Sending one of my dataframes straight to S3 as a CSV
from io import StringIO
import boto3
bucket = "my_bucket_name"
csv_buffer = StringIO()
df1.to_csv(csv_buffer)
s3_resource = boto3.resource('s3')
s3_resource.Object(bucket, 'df1.csv').put(Body=csv_buffer.getvalue())
CodePudding user response:
You can use a BytesIO
object to accomplish the same basic idea when saving to a xlsx file to save it to memory first, and then upload that data to S3:
import pandas as pd
import boto3, io, os, pathlib
## Creating two example dataframes
df1 = pd.DataFrame({'first_column': ['first_value','second_value'], 'second_column': ['first_value', 'second_value']})
df2 = pd.DataFrame({'first_column': ['xvalue', 'yvalue'], 'second_column': ['xavalue', 'yavalue']})
## Convert them into Excel Workbook in memory
with io.BytesIO() as xlsx_data:
with pd.ExcelWriter(xlsx_data) as writer:
df1.to_excel(writer, sheet_name='df1')
df2.to_excel(writer, sheet_name='df2')
## Upload the in-memory data to S3
s3 = boto3.client("s3")
bucket_name = "-example-"
object_name = "final_fake.xlsx"
s3.put_object(Bucket=bucket_name, Key=object_name, Body=xlsx_data.getvalue())