I have an Excel .xlsb sheet with data, some columns have number as output data, other columns should have dates as output. After uploading the data in Python, some columns have a number in stead of date. How can I change the format of the number in that specific column to a date? I use Pandas and ddf
The output of the dataframe of column date of birth ('dob_l1') shows '12150', which should be date '6-4-1933'.
I tried to solve this, but unfortunately I only managed to get the date '2050-01-12' which is incorrect. I used code 'ddf['nwdob_l1'] = pd.to_datetime(ddf['dob_l1'], format='%d%m%y',errors='coerce')'
Who can help me. I was happy to received some good feedback from joe90. He showed me a function that could help for singular dates:
import datetime
def xldate2date(xl): # valid for dates from 1900-03-01 basedate = datetime.date(1899,12,30) d = basedate datetime.timedelta(days=xl) return d
# Example:
# >>> print(xldate2date(44948))
# 2023-01-22
That is correct, however, I need to change all values in the column (> 500.000), so I cannot do that 1-by-1. As that question is closed, I hereby open a new question. Is there anyone who can help me to find the correct code to get the right date in the whole column?
CodePudding user response:
When you read the data in using pandas there are tools for the dates. You want to use parse_dates
example:
import pandas as pd
df = pd.read_excel('file/path/the.xlsx', parse_dates=['Date'])
This will change the date to be datetime64 format which is better than a number.