Home > Enterprise >  Python convert a date string to datetime recognised by Excel
Python convert a date string to datetime recognised by Excel

Time:01-27

I'm using python and pandas to query a table in SQL, store it in a DataFrame, then write it to an excel file (.xlsx).

I'm then using a couple of VBA macros to loop through the columns and do some conditional formatting to highlight outliers.

Everything works fine except the date column which excel gets stuck on and presents an error:

"Method 'Average' of object 'WorksheetFunction' failed"

The date is being stored as a string in the format '20-01-2022' which is presumably causing the error so I need to convert it to an actual datetime format that excel will recognise upon opening the file.

Example:

import pandas as pd

df = pd.DataFrame([[1, '21-06-2022'], [2, '19-08-2022'], [3, '06-04-2022']], columns=['id', 'date'])

df.to_excel("output.xlsx")

If you then open "output.xlsx" and try to use conditional formatting on the 'date' column, or try to =AVERAGE(C2:C4) either nothing happens or you get an error. If you double click into the cell, something happens and excel will suddenly recognise it, but this solution isn't suitable with thousands of cells.

How can I convert dates to a format that excel will recognise immediately upon opening the file?

CodePudding user response:

Before saving your df to excel, you need to parse those ISO8601 string to dates.

There are several ways to do that.

You can use the pandas.read_sql keyword argument parse_dates to parse specific columns as dates, even specifying the format, which can parse as dates directly.

import pandas as pd

df = pd.read_sql(
    sql,
    con,
    parse_dates={
        "<col1>": {"format": "%y-%m-%d"},
        "<col2>": {"format": "%d/%m/%y"}
    },
)

Same as above, but without a format, parses columns as datetimes and then the dates can be extracted.

import pandas as pd

df = pd.read_sql(sql, con, parse_dates=["<col1>", "<col2>"])
df[["<col1>", "<col2>"]] = df[["<col1>", "<col2>"]].dt.date

You can load then parse manually with pd.to_datetime, and again extract the dates only.

import pandas as pd

df = pd.read_sql(sql, con)
df[["<col1>", "<col2>"]] = pd.to_datetime(df[["<col1>", "<col2>"]]).dt.date

Or you could also just parse with datetime.date.fromisoformat.

import pandas as pd
from datetime import date

df = pd.read_sql(sql, con)
df[["<col1>", "<col2>"]] = df[["<col1>", "<col2>"]].applymap(date.fromisoformat)

NB. no specific ordering was used, but it seems the first method is slightly faster than the others, while also being the most elegant (in my opinion).

  • Related