Home > Mobile >  Why pandas strftime raises a FutureWarning and how to avoid it?
Why pandas strftime raises a FutureWarning and how to avoid it?

Time:04-03

I need to parse a date column to a string, it works, but I'm having some issues, first of all is not giving a string data-type as an output:

import pandas as pd  # No warning raised
exit_format = '%d-%m-%Y'
series = pd.Series([1,2,None] ,dtype='datetime64[ns]')
series.dt.strftime(exit_format)

This (intended behaviour) is not a big deal as can it can be fixed with astype('string') and replacing the Nan. But the worst thing is that if all values are NaN-Nat, I'm getting this FutureWarning:

# Warning raised!
series = pd.Series([None,None,None] ,dtype='datetime64[ns]')
series.dt.strftime(exit_format)

FutureWarning: In a future version, the Index constructor will not infer numeric dtypes when passed object-dtype sequences (matching Series behavior)

This seems to be come from a known issue at pandas version >= 1.4.0 (mine is '1.4.1'). My question is, how can I make a nice workaround? Why this warning appears in the first place when all values are Nan? Preferably I'm looking for a solution that doesn't suppress the warning directly but his origin.

Basically I'm looking for a function that takes a date column to a string column but parsing NaNs as empty strings or treating the errors in a better way (like using a default value), but without giving a warning.

pd: a possible solution could be adding a non empty row at the end and deleting it, but I was wandering if there is some implemented function that is actually working well, without using tricks...

CodePudding user response:

I had this problem too recently. What I did to bypass this warning is simply to check if the date value is non NAN first before using strftime.

example:

if df['date'].isnull().values == False:

 df['date'] = df['date'].dt.strftime('%b %d, %Y')

OR

if you need to check for ANY /ALL nan values for a date column:

syntax:

df['your column name'].isnull().values.any()
df['your column name'].isnull().values.all()

CodePudding user response:

Here is a bit more idiomatic way to take into account NaT values (Pandas>=1.0.0), as properly suggested by @finavatar:

import pandas as pd

series = pd.Series([None, None, None], dtype="datetime64[ns]")

series = series.apply(lambda x: x.strftime("%d-%m-%Y") if x is not pd.NaT else x)

print(series)  # No warning message
# Output
0   NaT
1   NaT
2   NaT
dtype: datetime64[ns]

And with a non empty Series:

import pandas as pd

series = pd.Series(["04/03/2022", None, None], dtype="datetime64[ns]")

series = series.apply(lambda x: x.strftime("%d-%m-%Y") if x is not pd.NaT else x)

print(series)  # No warning message
0    03-04-2022
1           NaT
2           NaT
dtype: object  # == string
  • Related