Home > database >  Format dates on an entire column with ExcelWriter and Openpyxl
Format dates on an entire column with ExcelWriter and Openpyxl

Time:01-01

I'm trying to write a pandas DataFrame to Excel, with dates formatted as "YYYY-MM-DD", omitting the time. Since I need to write multiple sheets, and I want to use some advanced formatting opens (namely setting the column width), I'm using an ExcelWriter object and openpyxl as engine.

Now, I just can't seem to figure out how to format my date column.

Starting with

import pandas as pd
df = pd.DataFrame({'string_col': ['abc', 'def', 'ghi']})
df['date_col'] = pd.date_range(start='2020-01-01', periods=3)
with pd.ExcelWriter('test.xlsx', engine='openpyxl') as writer:
    df.to_excel(writer, 'test', index=False)

This will write the dates as 2020-01-01 00:00:00. For some reason I can't understand, adding datetime_format='YYYY-MM-DD' has no effect if openpyxl is the selected engine (works just fine if engine is left unspecified).

So I'm trying to work around this:

with pd.ExcelWriter('test.xlsx', engine='openpyxl') as writer:
    df.to_excel(writer, 'test', index=False)
    writer.sheets['test'].column_dimensions['B'].width = 50
    writer.sheets['test'].column_dimensions['B'].number_format = 'YYYY-MM-DD'

The column width is properly applied, but not the number formatting. On the other hand, it does work applying the style to an individual cell: writer.sheets['test']['B2'].number_format = 'YYYY-MM-DD'.

But how can I apply the formatting to the entire column (I have tens of thousands of cells to format)? I couldn't find anything in the openpyxl documentation on how to address an entire column...

Note: I could do:

for cell in writer.sheets['test']['B']:
    cell.number_format = 'YYYY-MM-DD'

but my point is precisely to avoid iterating over each individual cell.

CodePudding user response:

You can treat your dates as a column of strings and slice it to get 'YYYY-MM-DD':

import pandas as pd

df = pd.DataFrame({'string_col': ['abc', 'def', 'ghi']})

df['date_col'] = pd.date_range(start='2020-01-01', periods=3)
df['date_col'] = df['date_col'].astype("str").str.slice(start=0, stop=10)

with pd.ExcelWriter('test.xlsx', engine='openpyxl') as writer:
    df.to_excel(writer, 'test', index=False)
    writer.sheets['test'].column_dimensions['B'].width = 50

CodePudding user response:

I know you are using openpyxl engine. But if you have flexibility to switch to xlsxwriter, I got it working using following code with help from enter image description here

  • Related