I have absolutely no idea on how to build a MWE for this because this behaviour happens when I slice my dataframe, see console output below:
First slice, I have NaT
for the last rows:
In [161]: df.iloc[3802:11775].astype({"date": "datetime64"})[["date"]]
Out[161]:
date
258 2014-09-14
259 2018-10-12
259 2018-10-12
259 2018-10-12
259 2018-10-12
.. ...
781 NaT
781 NaT
781 NaT
781 NaT
781 NaT
[7973 rows x 1 columns]
If I update the slice to start from 3803, it works well:
In [162]: df.iloc[3803:11775].astype({"date": "datetime64"})[["date"]]
Out[162]:
date
259 2018-10-12
259 2018-10-12
259 2018-10-12
259 2018-10-12
259 2018-10-12
.. ...
781 2014-09-14
781 2014-09-14
781 2014-09-14
781 2014-09-14
781 2014-09-14
[7972 rows x 1 columns]
and id I concat first and last parts because I thought there might be something with the rows at id 3802, it still works:
In [165]: pd.concat([df.iloc[3800:3803], df.iloc[11770:11775]]).astype({"date": "datetime64"})[["date"]]
Out[165]:
date
258 2014-09-14
258 2014-09-14
258 2014-09-14
781 2014-09-14
781 2014-09-14
781 2014-09-14
781 2014-09-14
781 2014-09-14
I have to say that I have been using pandas for more than 3 years but here I am completely lost.
EDIT
I have added the series in a gist
The bug is visible in this colab with pandas 1.3.0 and actually fixed in pandas 1.3.3
CodePudding user response:
I was unable to reproduce your error:
In [31]: df = pd.read_csv("date_error.csv")
In [32]: df.iloc[3802:11775].astype({"date": "datetime64"})
Out[32]:
Unnamed: 0 date
3802 258 2014-09-14
3803 259 2018-10-12
3804 259 2018-10-12
3805 259 2018-10-12
3806 259 2018-10-12
... ... ...
11770 781 2014-09-14
11771 781 2014-09-14
11772 781 2014-09-14
11773 781 2014-09-14
11774 781 2014-09-14
[7973 rows x 2 columns]
In [33]: df["date"].isna().any()
Out[33]: False
That said, I'd suggest using pd.to_datetime
instead of astype()
.