Home > database >  pandas datetime doesn't convert the dates properly in python
pandas datetime doesn't convert the dates properly in python

Time:11-17

i have a dataframe data

d=pd.DataFrame({"dat":["01-06-68", "01-06-57","14-02-80","01-01-04","07-11-20"],
                "j":[34,2,1,7,8]})

i want to convert the dat column to "YYYY-MM-DD" format which is currently in dd-mm-yy format

Code using

pd.to_datetime(d.dat)

The output of this is coming out to be wrong

0   2068-01-06
1   2057-01-06
2   1980-02-14
3   2004-01-01
4   2020-07-11
Name: dat, dtype: datetime64[ns]

Problems

  1. it was supposed to be giving output year as 1968 and not 2068
  2. Months and date are also not coming in proper order

Required Output:

    0   1968-06-01
    1   1957-06-01
    2   1980-02-14
    3   2004-01-01
    4   2020-11-07
    Name: dat, dtype: datetime64[ns]

        

CodePudding user response:

Solution with replace in callable for test last digits and then use %Y for match years in YYYY format:

f = lambda x: '19'   x.group() if int(x.group()) > 22 else '20'   x.group()
d.dat = d.dat.str.replace('(\d )$', f, regex=True)
d.dat = pd.to_datetime(d.dat, format='%d-%m-%Y')

print (d)
         dat   j
0 1968-06-01  34
1 1957-06-01   2
2 1980-02-14   1
3 2004-01-01   7
4 2020-11-07   8

Or subtract 100 years if year greater like 2022:

d.dat = pd.to_datetime(d.dat, format='%d-%m-%y')

d.dat = d.dat.mask(d.dat.dt.year.gt(2022), d.dat - pd.offsets.DateOffset(years=100))
print (d)

         dat   j
0 1968-06-01  34
1 1957-06-01   2
2 1980-02-14   1
3 2004-01-01   7
4 2020-11-07   8
  • Related