i have a dataframe data
d=pd.DataFrame({"dat":["01-06-68", "01-06-57","14-02-80","01-01-04","07-11-20"],
"j":[34,2,1,7,8]})
i want to convert the dat column to "YYYY-MM-DD" format which is currently in dd-mm-yy format
Code using
pd.to_datetime(d.dat)
The output of this is coming out to be wrong
0 2068-01-06
1 2057-01-06
2 1980-02-14
3 2004-01-01
4 2020-07-11
Name: dat, dtype: datetime64[ns]
Problems
- it was supposed to be giving output year as 1968 and not 2068
- Months and date are also not coming in proper order
Required Output:
0 1968-06-01
1 1957-06-01
2 1980-02-14
3 2004-01-01
4 2020-11-07
Name: dat, dtype: datetime64[ns]
CodePudding user response:
Solution with replace in callable for test last digits and then use %Y
for match years in YYYY format:
f = lambda x: '19' x.group() if int(x.group()) > 22 else '20' x.group()
d.dat = d.dat.str.replace('(\d )$', f, regex=True)
d.dat = pd.to_datetime(d.dat, format='%d-%m-%Y')
print (d)
dat j
0 1968-06-01 34
1 1957-06-01 2
2 1980-02-14 1
3 2004-01-01 7
4 2020-11-07 8
Or subtract 100 years if year greater like 2022
:
d.dat = pd.to_datetime(d.dat, format='%d-%m-%y')
d.dat = d.dat.mask(d.dat.dt.year.gt(2022), d.dat - pd.offsets.DateOffset(years=100))
print (d)
dat j
0 1968-06-01 34
1 1957-06-01 2
2 1980-02-14 1
3 2004-01-01 7
4 2020-11-07 8