I am using the following code inside a R magic cell:
%%R -o df
library(tibble)
df <- tibble(x = c("a", "b", NA))
However, when I run in another cell (a Python one):
df.isna()
I get
x
1 False
2 False
3 False
In fact, the imported dataframe is
x
1 a
2 b
3 NA_character_
How can I convert NA_character_
to a Python NaN
?
I have tried
df.replace('NA_character_', np.nan)
but with no success.
CodePudding user response:
As you set out in the comments, the R NA_character_
value is not converted to np.nan
, but has a different type, rpy2.rinterface_lib.sexp.NACharacterType
. In this case, the solution is simply to iterate over the column and convert this type to np.nan
:
import rpy2 # if you haven't already
df['x'].apply(lambda val: np.nan if isinstance(
val, rpy2.rinterface_lib.sexp.NACharacterType)
else val
)
As for whether this is a bug, the changes for release 3.3.0 states:
The value nan in pandas Series with strings is now converted to R NA (issue #668).
However, the converse does not appear to happen. I don't know whether that means it's a bug, a design decision or simply that this has not yet been implemented.