I have a DataFrame that contains a datetime64
and a timedelta64
. Unfortunately, I can't export the latter to a properly formatted hh:mm:ss
column in an Excel file:
import pandas as pd
data = {
"date": [
"2023-02-05",
"2023-02-05",
"2022-12-02",
"2022-11-29",
"2022-11-18",
],
"duration": [
"01:07:48",
"05:23:06",
"02:41:58",
"00:35:11",
"02:00:20",
],
}
df = pd.DataFrame(data)
df['date'] = pd.to_datetime(df['date'], format='%Y-%m-%d')
df['duration'] = pd.to_timedelta(df['duration'])
with pd.ExcelWriter(
"df.xlsx",
datetime_format="YYYY-MM-DD",
engine="xlsxwriter",
) as writer:
workbook = writer.book
time_format = workbook.add_format({"num_format": "HH:MM:SS"})
df.to_excel(writer, sheet_name="sheet", index=False)
worksheet = writer.sheets["sheet"]
worksheet.set_column("A:A", 20)
worksheet.set_column("B:B", 50, cell_format=time_format)
The resulting Excel file will display like this:
So, the date_time
format in the ExcelWriter
object is applied correctly for column A, as well as the width setting for column B, but the number formatting isn't working.
What am I doing wrong?
CodePudding user response:
The problem is that Excel is measuring your time in day units. For example, for your first value, (1:07:48 = 4068s) you are getting a duration of (4048/(24*3600)) days.
You have the posible solutions here:
You could also covert the timedelta back to a number (like Pandas does) since dates or times in Excel are just numbers anyway with a format.
Something like this, which will give the same result as above:
f = pd.DataFrame(data)
df['date'] = pd.to_datetime(df['date'], format='%Y-%m-%d')
df['duration'] = pd.to_timedelta(df['duration']).dt.total_seconds() / 86400
with pd.ExcelWriter(
"df.xlsx",
datetime_format="YYYY-MM-DD",
engine="xlsxwriter",
) as writer:
workbook = writer.book
time_format = workbook.add_format({"num_format": "HH:MM:SS"})
df.to_excel(writer, sheet_name="sheet", index=False)
worksheet = writer.sheets["sheet"]
worksheet.set_column("A:A", 20)
worksheet.set_column("B:B", 50, cell_format=time_format)