Home > Back-end >  pandas asype("datetime64") is not consistent and does not raise
pandas asype("datetime64") is not consistent and does not raise

Time:09-22

I have absolutely no idea on how to build a MWE for this because this behaviour happens when I slice my dataframe, see console output below:

First slice, I have NaTfor the last rows:

In [161]: df.iloc[3802:11775].astype({"date": "datetime64"})[["date"]]
Out[161]:
          date
258 2014-09-14
259 2018-10-12
259 2018-10-12
259 2018-10-12
259 2018-10-12
..         ...
781        NaT
781        NaT
781        NaT
781        NaT
781        NaT

[7973 rows x 1 columns]

If I update the slice to start from 3803, it works well:

In [162]: df.iloc[3803:11775].astype({"date": "datetime64"})[["date"]]
Out[162]:
          date
259 2018-10-12
259 2018-10-12
259 2018-10-12
259 2018-10-12
259 2018-10-12
..         ...
781 2014-09-14
781 2014-09-14
781 2014-09-14
781 2014-09-14
781 2014-09-14

[7972 rows x 1 columns]

and id I concat first and last parts because I thought there might be something with the rows at id 3802, it still works:

In [165]: pd.concat([df.iloc[3800:3803], df.iloc[11770:11775]]).astype({"date": "datetime64"})[["date"]]
Out[165]:
          date
258 2014-09-14
258 2014-09-14
258 2014-09-14
781 2014-09-14
781 2014-09-14
781 2014-09-14
781 2014-09-14
781 2014-09-14

I have to say that I have been using pandas for more than 3 years but here I am completely lost.

EDIT

I have added the series in a gist

The bug is visible in this colab with pandas 1.3.0 and actually fixed in pandas 1.3.3

CodePudding user response:

I was unable to reproduce your error:

In [31]: df = pd.read_csv("date_error.csv")

In [32]: df.iloc[3802:11775].astype({"date": "datetime64"})
Out[32]:
       Unnamed: 0       date
3802          258 2014-09-14
3803          259 2018-10-12
3804          259 2018-10-12
3805          259 2018-10-12
3806          259 2018-10-12
...           ...        ...
11770         781 2014-09-14
11771         781 2014-09-14
11772         781 2014-09-14
11773         781 2014-09-14
11774         781 2014-09-14

[7973 rows x 2 columns]

In [33]: df["date"].isna().any()
Out[33]: False

That said, I'd suggest using pd.to_datetime instead of astype().

  • Related