Home > Software engineering >  Is there any way you can parse/reformat an invalid date data in a column in Python without coercing?
Is there any way you can parse/reformat an invalid date data in a column in Python without coercing?

Time:12-31

I have this dataset with an invalid date data within a column. It is formatted yyyymmdd and I need them to be reformatted to mm/dd/yyyy. I tried coercing the value but it doesnt satisfy the conditions since it is a data and it needed to be printed out even though it is not valid.

heres a sample of the data in csv

heres a sample of the data in csv

The data have a day of '00' and we all know that day 0 is non-existent thus it produce me errors in printing the dataframe.

I tried replacing errors='coerce to errors='ignore just to see if it will push through the conditions but it doesnt.

I want to print/reformat the invalid data without coercing the value. Is there any way around?

Here is my line of code for that:

df['charge_off_date'] = pd.to_datetime(hals2['charge_off_date'], format='%Y%m%d', errors='ignore')
df['charge_off_date'] = df['charge_off_date'].dt.strftime('%m/%d/%Y')

CodePudding user response:

If it's invalid you cannot format it as a date imho. You can treat it as a string though and knowing that it's yyyymmdd format you can just format a string in a custom function and apply it to your column.

def format_invalid_date(d: int)->str:
    d=str(d)
    return f"{d[:4]}/{d[4:6]}/{d[6:]}"

df['charge_off_date']=df['charge_off_date'].apply(format_invalid_date)

That should convert 19000100 to 1900/01/00, which is still invalid as a date, but looks like a date format.

CodePudding user response:

def format_invalid_date(date):
    year = int(str(date)[0:4])
    month = int(str(date)[4:6])
    day = int(str(date)[6:8])
    
    if day == 0:
        day = 1
        
    date = datetime.datetime(year, month, day).strftime("%Y/%m/%d")
    
    return date

   
df['charge_off_date'] = df['charge_off_date'].apply(format_invalid_date)

Example:

df = pd.DataFrame({'charge_off_date': [19000100, 19901120, 20131202]})

df['charge_off_date'] = df['charge_off_date'].apply(format_invalid_date)
print(df) 

Output:

  charge_off_date
0      1900/01/01
1      1990/11/20
2      2013/12/02
  • Related