How to upload an Excel sheet/workbook DIRECTLY from Jupyter Notebook to Amazon's S3?-CodePudding

Aim

I have a few different dataframes created with Pandas on Python (in Jupyter Notebook). I want to upload them as separate sheets to an Excel workbook STRAIGHT to Amazon's S3.

Reprex

## Creating two example dataframes

data1 = {'first_column':  ['first_value','second_value'],
        'second_column': ['first_value', 'second_value']}
df1 = pd.DataFrame(data1)

data2 = {'first_column':  ['xvalue', 'yvalue'],
        'second_column': ['xavalue', 'yavalue']}
df2 = pd.DataFrame(data2)

## Convert them into Excel Workbook and storing locally

with pd.ExcelWriter('fake_file.xlsx') as writer:  
    df1.to_excel(writer, sheet_name='df1')
    df2.to_excel(writer, sheet_name='df2')

## Uploading the locally stored Excel Workbook onto S3

import boto3
import pathlib
import os

s3 = boto3.client("s3")
bucket_name = "my_bucket_name"
object_name = "final_fake.xlsx"
__file__ = "my_python_script.ipynb"
file_name = os.path.join(pathlib.Path(__file__).parent.resolve(), "fake_file.xlsx")

s3.upload_file(file_name, bucket_name, object_name)

Solution sought

How can I create an excel workbook on S3 using my different dataframes without storing it locally on Jupyter Notebook?

Below, I was able to upload a dataframe straight to S3 as a csv. How can I do the same but send it as a sheet onto an Excel Workbook?

## Sending one of my dataframes straight to S3 as a CSV

from io import StringIO 
import boto3

bucket = "my_bucket_name"
csv_buffer = StringIO()
df1.to_csv(csv_buffer)

s3_resource = boto3.resource('s3')
s3_resource.Object(bucket, 'df1.csv').put(Body=csv_buffer.getvalue())

CodePudding user response：

You can use a BytesIO object to accomplish the same basic idea when saving to a xlsx file to save it to memory first, and then upload that data to S3:

import pandas as pd
import boto3, io, os, pathlib

## Creating two example dataframes
df1 = pd.DataFrame({'first_column': ['first_value','second_value'], 'second_column': ['first_value', 'second_value']})
df2 = pd.DataFrame({'first_column': ['xvalue', 'yvalue'], 'second_column': ['xavalue', 'yavalue']})

## Convert them into Excel Workbook in memory
with io.BytesIO() as xlsx_data:
    with pd.ExcelWriter(xlsx_data) as writer:  
        df1.to_excel(writer, sheet_name='df1')
        df2.to_excel(writer, sheet_name='df2')

    ## Upload the in-memory data to S3
    s3 = boto3.client("s3")
    bucket_name = "-example-"
    object_name = "final_fake.xlsx"
    s3.put_object(Bucket=bucket_name, Key=object_name, Body=xlsx_data.getvalue())