I need to parse a date column to a string, it works, but I'm having some issues, first of all is not giving a string data-type as an output:
import pandas as pd # No warning raised
exit_format = '%d-%m-%Y'
series = pd.Series([1,2,None] ,dtype='datetime64[ns]')
series.dt.strftime(exit_format)
This (intended behaviour) is not a big deal as can it can be fixed with astype('string')
and replacing the Nan
. But the worst thing is that if all values are NaN-Nat, I'm getting this FutureWarning
:
# Warning raised!
series = pd.Series([None,None,None] ,dtype='datetime64[ns]')
series.dt.strftime(exit_format)
FutureWarning: In a future version, the Index constructor will not infer numeric dtypes when passed object-dtype sequences (matching Series behavior)
This seems to be come from a known issue at pandas version >= 1.4.0 (mine is '1.4.1'). My question is, how can I make a nice workaround? Why this warning appears in the first place when all values are Nan? Preferably I'm looking for a solution that doesn't suppress the warning directly but his origin.
Basically I'm looking for a function that takes a date column to a string column but parsing NaNs as empty strings or treating the errors in a better way (like using a default value), but without giving a warning.
pd: a possible solution could be adding a non empty row at the end and deleting it, but I was wandering if there is some implemented function that is actually working well, without using tricks...
CodePudding user response:
I had this problem too recently. What I did to bypass this warning is simply to check if the date value is non NAN first before using strftime.
example:
if df['date'].isnull().values == False:
df['date'] = df['date'].dt.strftime('%b %d, %Y')
OR
if you need to check for ANY /ALL nan values for a date column:
syntax:
df['your column name'].isnull().values.any()
df['your column name'].isnull().values.all()
CodePudding user response:
Here is a bit more idiomatic way to take into account NaT
values (Pandas>=1.0.0), as properly suggested by @finavatar:
import pandas as pd
series = pd.Series([None, None, None], dtype="datetime64[ns]")
series = series.apply(lambda x: x.strftime("%d-%m-%Y") if x is not pd.NaT else x)
print(series) # No warning message
# Output
0 NaT
1 NaT
2 NaT
dtype: datetime64[ns]
And with a non empty Series:
import pandas as pd
series = pd.Series(["04/03/2022", None, None], dtype="datetime64[ns]")
series = series.apply(lambda x: x.strftime("%d-%m-%Y") if x is not pd.NaT else x)
print(series) # No warning message
0 03-04-2022
1 NaT
2 NaT
dtype: object # == string