How to load a RTYPES.NILSXP data object when using rpy2?-CodePudding

rpy2 allows me to use some but not all of the returned values from a function in library(Benchmarking) in Python. How do I get the rest?

Set-up:

import pandas as pd

import rpy2
import rpy2.robjects as robjects
import rpy2.robjects.packages as rpackages
from rpy2.robjects.vectors import StrVector
from rpy2.robjects.packages import importr
utils = rpackages.importr('utils')
utils.chooseCRANmirror(ind=1)

from rpy2.robjects import pandas2ri

from rpy2.robjects.conversion import localconverter


packnames = ('Benchmarking')
utils.install_packages(StrVector(packnames))

Benchmarking = importr('Benchmarking')
base = importr('base')

data = pd.read_csv("path/to_data.csv")

with localconverter(robjects.default_converter   pandas2ri.converter):
  crs = Benchmarking.dea(data['Age'], data['CO2'], RTS='crs', ORIENTATION='in')

crs['eff'] or crs['lambda'] work fine and return ndarrays

crs
____________________________________________________________________
o{'eff': [1.    0.625 0.5  ], 'lambda': [[1.   0.   0.  ]
 [1.25 0.   0.  ]
 [1.5  0.   0.  ]], 'objval': [1.    0.625 0.5  ], 'RTS': [1] "crs"
, 'primal': <rpy2.rinterface_lib.sexp.NULLType object at 0x00000220BCB0D1C0> [RTYPES.NILSXP], 'dual': <rpy2.rinterface_lib.sexp.NULLType object at 0x00000220BCB0D1C0> [RTYPES.NILSXP], 'ux': <rpy2.rinterface_lib.sexp.NULLType object at 0x00000220BCB0D1C0> [RTYPES.NILSXP], 'vy': <rpy2.rinterface_lib.sexp.NULLType object at 0x00000220BCB0D1C0> [RTYPES.NILSXP], 'gamma': function (x)  .Primitive("gamma")
, 'ORIENTATION': [1] "in"
, 'TRANSPOSE': [1] FALSE
, 'param': <rpy2.rinterface_lib.sexp.NULLType object at 0x00000220BCB0D1C0> [RTYPES.NILSXP], }

So far so good.

However there is more useful data that I would like to extract eg.

crs['dual']
_______________________________________________________________
<rpy2.rinterface_lib.sexp.NULLType object at 0x00000220BCB0D1C0> [RTYPES.NILSXP]

What kind of object is this? <>

Searching up RTYPES.NILSXP in the 3.5.3 docs takes me to a page in the docs which is the only mention I have found.

I have no idea how to read this. The docs explains that datasets can be serialised R objects or serialised R code that produces the dataset. rpy2 employs 'lazy loading' and to load the data, one must use the method fetch() but I don't seem to be able to use it correctly to load the rest of the outputs from dea(x, y, *args)

Failed attempts to load data


rpy2.robjects.packages.PackageData.fetch(crs['dual'])
_______________________________________________________________
TypeError: PackageData.fetch() missing 1 required positional argument: 'name'

I've found fetch() method belongs to PackageData. I've tried to call it but now it asks me for the 'name' of this dataset?? I thought crs['dual'] was enough information. When I pass in 'dual' as the name parameter I get

rpy2.robjects.packages.PackageData.fetch(r_from_df_crs['dual'], 'dual')

File ~\anaconda3\envs\UROP_buildings_env\lib\site-packages\rpy2\robjects\packages.py:143, in PackageData.fetch(self, name)
    136 def fetch(self, name):
    137     """ Fetch the dataset (loads it or evaluates the R associated
    138     with it.
    139 
    140     In R, datasets are loaded into the global environment by default
    141     but this function returns an environment that contains the dataset(s).
    142     """
--> 143     if self._datasets is None:
    144         self._init_setlist()
    146     if name not in self._datasets:

AttributeError: 'NULLType' object has no attribute '_datasets

so I am stuck. How can I deserialise this <RTYPES.NILSXP> object from memory?

CodePudding user response：

Add a custom converter for it:

my_converter = robjects.default_converter   pandas2ri.converter

@local_rules.rpy2py.register(rinterface.NULLType)
def rpy2py_null(obj):
    return None

with localconverter(my_converter):
  crs = Benchmarking.dea(data['Age'], data['CO2'], RTS='crs', ORIENTATION='in')

CodePudding user response：

However there is more useful data that I would like to extract eg.
crs['dual']
_______________________________________________________________
<rpy2.rinterface_lib.sexp.NULLType object at 0x00000220BCB0D1C0> [RTYPES.NILSXP]
What kind of object is this? <>

This is R's NULL object. This is pretty much like a NULL in C, or a null in Java or Javascript. With the increasing use of Python for data science, None came to mean either the equivalent of a NULL, or a missing value (which is an NA or variant in R). The default conversion is returning an R NULL rather than convert to None to avoid confusions. This is an early decision though. If the need to revisit this an issue should be opened on the project's page (on Github).

Searching up RTYPES.NILSXP in the 3.5.3 docs takes me to a page in the docs which is the only mention I have found.

The mention on that page refers to the default value for a named argument: https://rpy2.github.io/doc/latest/html/robjects_rpackages.html#rpy2.robjects.packages.PackageData I see that the documentation is incomplete. lib_loc is an optional path for the class constructor indicating where the R package is installed.

I have no idea how to read this. The docs explains that datasets can be serialised R objects or serialised R code that produces the dataset. rpy2 employs 'lazy loading' and to load the data, one must use the method fetch() but I don't seem to be able to use it correctly to load the rest of the outputs from dea(x, y, *args)

What is meant here is that "data" objects in R packages are not necessarily serialized R data structures like the R functions save(), dump(), or dput() can help produce. They can also be R scripts. For example, an R package can have a data object "myrandnorm100" that is an R script data/myrandnorm100.R in the installed package's directory and that script will be evaluated using the R function source() (see https://rdrr.io/r/utils/data.html). That script can define an arbitrary number of variables. Note that serialized R data (for example in an .RData file can also contain several named objects). The design choice for rpy2 was to try make things a little safer and predictable by keeping those names within a namespace. Silent name clashes can be at the root of challenging bugs in code.

(...)

rpy2.robjects.packages.PackageData.fetch(crs['dual'])

TypeError: PackageData.fetch() missing 1 required positional argument: 'name'

I've found fetch() method belongs to PackageData. I've tried to call it but now it asks me for the 'name' of this dataset??

Yes. The PackageData object is like a namespace with as many named objects as the author of the R package wanted to include.

I thought crs['dual'] was enough information. When I pass in 'dual' as the name parameter I get rpy2.robjects.packages.PackageData.fetch(r_from_df_crs['dual'], 'dual')

File ~\anaconda3\envs\UROP_buildings_env\lib\site-packages\rpy2\robjects\packages.py:143, in PackageData.fetch(self, name)
    136 def fetch(self, name):
    137     """ Fetch the dataset (loads it or evaluates the R associated
    138     with it.
    139 
    140     In R, datasets are loaded into the global environment by default
    141     but this function returns an environment that contains the dataset(s).
    142     """
--> 143     if self._datasets is None:
    144         self._init_setlist()
    146     if name not in self._datasets:

AttributeError: 'NULLType' object has no attribute '_datasets>

Well, this is not how DataPackage objects can be instanciated. I'll use the R package datasets as an example since it is part of the R standard library.

# Import the R package "datasets"
datasets = importr('datasets')
# That package only contains datasets. The Python object will look
# like it has no (useful) attributes. We can create an instance
# for the data in the package with:
datasets_data = rpackages.data(datasets)
# All dataset names are available through `datasets_data.names()`.
# We know the name of the one we want.
mtcars_env = datasets_data.fetch('mtcars')
# mtcars_env is an R "environment", wrapped as an `rpy2.robjects.Environment`

Note: I am seeing now that the doc has that information, but available throughout scattered examples rather also on the page about packages (see https://rpy2.github.io/doc/latest/html/search.html?q=data fetch).