Home > OS >  Pandas str.fullmatch unusual behaviour with NaN
Pandas str.fullmatch unusual behaviour with NaN

Time:03-17

When a column of a pandas Dataframe contains only NaN, str.fullmatch throws:

AttributeError: Can only use .str accessor with string values!

The following 2 behave as expected:

data1 = [ ['2022-03-15 00:00:00'], [np.NaN] ]
df = pd.DataFrame(data1, columns = ['Date'] )
df = df.loc[ df.Date.str.fullmatch( '[0-9]{4}-[0-9]{2}-[0-9]{2}\s00:00:00', na=True ) ]
print(df)

data1 = [ [np.NaN], ['2022-03-15 00:00:00'] ]
df = pd.DataFrame(data1, columns = ['Date'] )
df = df.loc[ df.Date.str.fullmatch( '[0-9]{4}-[0-9]{2}-[0-9]{2}\s00:00:00', na=True ) ]
print(df)

Only when the column is entirely NaN is the error thrown:

data1 = [ [np.NaN], [np.NaN] ]
df = pd.DataFrame(data1, columns = ['Date'] )
dateRegex = '[0-9]{4}-[0-9]{2}-[0-9]{2}\s00:00:00'
df = df.loc[ df.Date.str.fullmatch(dateRegex, na=True) ]

Should it not fill the NaN as True and thus be accepted with loc as it does for the other two above?

CodePudding user response:

When you create a Series with only NaN values, the dtype of the Series is float since NaN is a float:

>>> s = pd.Series([np.nan, np.nan])
>>> s.dtype
dtype('float64')

>>> s.str
...
AttributeError: Can only use .str accessor with string values!

You need to convert it to the object (not necessarily str ;) dtype before you can use .str:

>>> s.astype(object).str
<pandas.core.strings.accessor.StringMethods at 0x122deb1c0>

So...

data1 = [ [np.NaN], [np.NaN] ]
df = pd.DataFrame(data1, columns = ['Date'])
dateRegex = '[0-9]{4}-[0-9]{2}-[0-9]{2}\s00:00:00'
df.Date = df.Date.astype(object)  # <--- Add this line
df = df.loc[ df.Date.str.fullmatch(dateRegex, na=True) ]

Output:

>>> df
  Date
0  NaN
1  NaN
  • Related