When a column of a pandas Dataframe contains only NaN, str.fullmatch throws:
AttributeError: Can only use .str accessor with string values!
The following 2 behave as expected:
data1 = [ ['2022-03-15 00:00:00'], [np.NaN] ]
df = pd.DataFrame(data1, columns = ['Date'] )
df = df.loc[ df.Date.str.fullmatch( '[0-9]{4}-[0-9]{2}-[0-9]{2}\s00:00:00', na=True ) ]
print(df)
data1 = [ [np.NaN], ['2022-03-15 00:00:00'] ]
df = pd.DataFrame(data1, columns = ['Date'] )
df = df.loc[ df.Date.str.fullmatch( '[0-9]{4}-[0-9]{2}-[0-9]{2}\s00:00:00', na=True ) ]
print(df)
Only when the column is entirely NaN is the error thrown:
data1 = [ [np.NaN], [np.NaN] ]
df = pd.DataFrame(data1, columns = ['Date'] )
dateRegex = '[0-9]{4}-[0-9]{2}-[0-9]{2}\s00:00:00'
df = df.loc[ df.Date.str.fullmatch(dateRegex, na=True) ]
Should it not fill the NaN as True and thus be accepted with loc as it does for the other two above?
CodePudding user response:
When you create a Series with only NaN values, the dtype of the Series is float
since NaN
is a float
:
>>> s = pd.Series([np.nan, np.nan])
>>> s.dtype
dtype('float64')
>>> s.str
...
AttributeError: Can only use .str accessor with string values!
You need to convert it to the object
(not necessarily str
;) dtype before you can use .str
:
>>> s.astype(object).str
<pandas.core.strings.accessor.StringMethods at 0x122deb1c0>
So...
data1 = [ [np.NaN], [np.NaN] ]
df = pd.DataFrame(data1, columns = ['Date'])
dateRegex = '[0-9]{4}-[0-9]{2}-[0-9]{2}\s00:00:00'
df.Date = df.Date.astype(object) # <--- Add this line
df = df.loc[ df.Date.str.fullmatch(dateRegex, na=True) ]
Output:
>>> df
Date
0 NaN
1 NaN