Issue
I have a dataframe. The template I want to use only has column headings. The column headings in the dataframe are identical to the template column headings. How do I paste the contents of the dataframe into the template excel sheet?
Reprex
Example dataframe
import pandas as pd
data_input = {'Area':['North', 'North', 'North', 'South', 'South', 'South', 'West', 'West', 'West', 'East', "East","East"],
"Sub-Area": ["North2", "North1", "North2", "South2", "South1", "South2","West3", "West9", "West9", "East1", "East4", "East1"],
"Workers": [1,20,30, 2,33,5, 3,6,44, 1, 11, 111],
"Job1":["T", "T", "T", "X","T", "T", "T", "X", "T", "X","T", "T"],
"Job2":["F", "X", "T", "X","T", "F", "T", "X", "F", "X","T", "T"],
"Job3":["T", "F", "T", "X","X", "F", "F", "T","X", "X","T", "T"]}
# Create DataFrame
df1 = pd.DataFrame(data_input)
Attempt #1
# Save dataframe to the template file on S3
with io.BytesIO() as output:
with pd.ExcelWriter(output, engine='openpyxl') as writer:
df1.to_excel(writer, sheet_name='Sheet1',startcol = 0, startrow=2)
data = output.getvalue()
s3 = boto3.resource('s3')
s3.Bucket('main_folder').put_object(Key='sub_folder/template.xlsx', Body=data)
Problem: The above solution just writes my dataset over the template file.
Attempt #2: Appending the dataframe via mode = "a"
# Save dataframe to the template file on S3
# Save file to S3
with io.BytesIO() as output:
# here I add mode = "a"
with pd.ExcelWriter(output, engine='openpyxl', mode = "a") as writer:
df1.to_excel(writer, sheet_name='Sheet1',startcol = 0, startrow=2)
data = output.getvalue()
s3 = boto3.resource('s3')
s3.Bucket('main_folder').put_object(Key='sub_folder/template.xlsx', Body=data)
Problem: Error Message
BadZipFile: File is not a zip file
Attempt 3
In response to a comment from jsn, I tried to first append the df to the template and then load that to S3 but it overwrote all the formatting of the template again.
#downloading template
template = pd.read_excel('s3://main_folder/sub_folder/template.xlsx', sheet_name="Sheet1")
#appending the dataframe
template = template.append(df1)
#now loading to S3
with io.BytesIO() as output:
with pd.ExcelWriter(output, engine='openpyxl') as writer:
template.to_excel(writer, sheet_name='Sheet1')
data = output.getvalue()
s3 = boto3.resource('s3')
s3.Bucket('main_folder').put_object(Key='sub_folder/template.xlsx', Body=data)
Any help would be appreciated
CodePudding user response:
- The
pandas
library may not be suited to storexlsx
formatting state. - The alternative here could be
openpyxl
library which lets you load a workbook and does integrate with pandas to append your data.
You could attempt to do something like this:
from io import BytesIO
from tempfile import NamedTemporaryFile
import boto3
import pandas as pd
from openpyxl import load_workbook
from openpyxl.utils.dataframe import dataframe_to_rows
# Load Template from S3
bucket_object = boto3.resource('s3').Bucket(bucket_name).Object(object_key)
content = bucket_object.get()['Body'].read()
# Input Data
data_input = {
'Area': ['North', 'North', 'North', 'South', 'South', 'South', 'West', 'West', 'West', 'East', "East", "East"],
"Sub-Area": ["North2", "North1", "North2", "South2", "South1", "South2", "West3", "West9", "West9", "East1",
"East4", "East1"],
"Workers": [1, 20, 30, 2, 33, 5, 3, 6, 44, 1, 11, 111],
"Job1": ["T", "T", "T", "X", "T", "T", "T", "X", "T", "X", "T", "T"],
"Job2": ["F", "X", "T", "X", "T", "F", "T", "X", "F", "X", "T", "T"],
"Job3": ["T", "F", "T", "X", "X", "F", "F", "T", "X", "X", "T", "T"]}
# Create DataFrame
df = pd.DataFrame(data_input)
# Load Workbook
wb = load_workbook(filename=(BytesIO(content)))
ws = wb['Sheet1']
# Append contents of Input Data to Workbook
for r in dataframe_to_rows(df, index=False, header=False):
ws.append(r)
# Save Workbook back to S3
with NamedTemporaryFile() as tmp:
filename = '/tmp/{}'.format(object_key)
wb.save(filename)
s3_resource.Bucket(bucket_name).upload_file(Filename=filename, Key=dest_filename)
Reference material as follows:
- https://openpyxl.readthedocs.io/en/stable/pandas.html
- https://danh-was-here.netlify.app/save-excel-workbook-to-aws-s3-with-python/
Note:
This was tested locally (i.e. without AWS) and the output suggested that formatting applied to the heading columns in the template file remained even after the new data was added.