I have an Excel .xlsb sheet with data, some columns have number as output data, other columns should have dates as output. After uploading the data in Python, some columns have a number in stead of date. How can I convert the number in that specific column to a date?
I tried to find an answer, but didn't succeed
CodePudding user response:
I think you still have dates but the numbers are the internal representation of the data.
Nice page explaining the internal representation: https://www.myonlinetraininghub.com/excel-date-and-time
In short the key details from this page:
Excel stores dates and time as a number known as the date serial number, or date-time serial number.
When you look at a date in Excel it's actually a regular number that has been formatted to look like a date. If you change the cell format to 'General' you'll see the underlying date serial number.
The integer portion of the date serial number represents the day, and the decimal portion is the time. Dates start from 1st January 1900 i.e. 1/1/1900 has a date serial number of 1.
So finally you just have to add the number of days to the starting date 1.1.1900 (for the python code see here: Adding days to a date in Python)
CodePudding user response:
In Excel the date is represented as the integer part of the number, and counts the number of days since 1899-12-31. For example 1 is 1900-01-01.
However you need to be aware there is a "bug" in Excel, it wrongly considers 1900 to be a leap year, i.e. it counts the date 1900-02-29 as a valid date. This bug was intentionally added to Excel so it would reproduce the wrong behaviour of Lotus, that was the most popular Windows spreadsheet software at the time Excel was created. So assuming you do not care about dates before 1900-03-01, then you could instead assume the number is a count of the number of days since 1899-12-30.
def xldate2date(xl):
# valid for dates from 1900-03-01
basedate = datetime.date(1899,12,30)
d = basedate datetime.timedelta(days=xl)
return d