i am new to pandas and I try to convert an int type-column to an date type-column .
The int in the df is something like: 10712 (first day, then month, then year).
I tried solving this with:
df_date = pd.to_datetime(df['Date'], format='%d%m%Y')
but I always get the following value error:
time data '10712' does not match format '%d%m%Y' (match)
Thank you for your help :)
CodePudding user response:
You should use %y
(2-digit year) instead of %Y
(4-digit year). But that is not enough.
The format %d%m%y
converts 10712
to 10-07-2012
, not to 1-07-2012
as you expect.
That's because of the following feature of the underlying strptime
:
When used with the strptime() method, the leading zero is optional for %m
A workaround could be to convert to a format properly understandable by strptime
(and to_datetime
):
>>> df = pd.DataFrame({'date': [10712, 20813, 30914]})
>>> df
date
0 10712
1 20813
2 30914
>>> df1 = df.date.astype(str).str.replace('(\d )(\d\d)(\d\d)',
r'\2/\1/\3', regex=True)
>>> df1
0 07/1/12
1 08/2/13
2 09/3/14
>>> pd.to_datetime(df1)
0 2012-07-01
1 2013-08-02
2 2014-09-03
CodePudding user response:
Use %y
year specifier to parse year without century digits:
In [654]: pd.to_datetime(10712, format='%d%m%y')
Out[654]: Timestamp('2012-07-10 00:00:00')
CodePudding user response:
pandas.to_datetime
only work with '%Y%m%d'
, that is why you can use %d%m%Y
.
example =>
>>> pd.to_datetime('13000101', format='%Y%m%d', errors='ignore')
datetime.datetime(1300, 1, 1, 0, 0)
>>> pd.to_datetime('13000101', format='%d%m%Y', errors='ignore')
'13000101'
>>> pd.to_datetime('13000101', format='%m%d%Y', errors='ignore')
'13000101'