I have this dataset with an invalid date data within a column. It is formatted yyyymmdd
and I need them to be reformatted to mm/dd/yyyy
. I tried coercing the value but it doesnt satisfy the conditions since it is a data and it needed to be printed out even though it is not valid.
heres a sample of the data in csv
The data have a day of '00' and we all know that day 0 is non-existent thus it produce me errors in printing the dataframe.
I tried replacing errors='coerce
to errors='ignore
just to see if it will push through the conditions but it doesnt.
I want to print/reformat the invalid data without coercing the value. Is there any way around?
Here is my line of code for that:
df['charge_off_date'] = pd.to_datetime(hals2['charge_off_date'], format='%Y%m%d', errors='ignore')
df['charge_off_date'] = df['charge_off_date'].dt.strftime('%m/%d/%Y')
CodePudding user response:
If it's invalid you cannot format it as a date imho. You can treat it as a string though and knowing that it's yyyymmdd
format you can just format a string in a custom function and apply it to your column.
def format_invalid_date(d: int)->str:
d=str(d)
return f"{d[:4]}/{d[4:6]}/{d[6:]}"
df['charge_off_date']=df['charge_off_date'].apply(format_invalid_date)
That should convert 19000100
to 1900/01/00
, which is still invalid as a date, but looks like a date format.
CodePudding user response:
def format_invalid_date(date):
year = int(str(date)[0:4])
month = int(str(date)[4:6])
day = int(str(date)[6:8])
if day == 0:
day = 1
date = datetime.datetime(year, month, day).strftime("%Y/%m/%d")
return date
df['charge_off_date'] = df['charge_off_date'].apply(format_invalid_date)
Example:
df = pd.DataFrame({'charge_off_date': [19000100, 19901120, 20131202]})
df['charge_off_date'] = df['charge_off_date'].apply(format_invalid_date)
print(df)
Output:
charge_off_date
0 1900/01/01
1 1990/11/20
2 2013/12/02