rpy2 allows me to use some but not all of the returned values from a function in library(Benchmarking) in Python. How do I get the rest?
Set-up:
import pandas as pd
import rpy2
import rpy2.robjects as robjects
import rpy2.robjects.packages as rpackages
from rpy2.robjects.vectors import StrVector
from rpy2.robjects.packages import importr
utils = rpackages.importr('utils')
utils.chooseCRANmirror(ind=1)
from rpy2.robjects import pandas2ri
from rpy2.robjects.conversion import localconverter
packnames = ('Benchmarking')
utils.install_packages(StrVector(packnames))
Benchmarking = importr('Benchmarking')
base = importr('base')
data = pd.read_csv("path/to_data.csv")
with localconverter(robjects.default_converter pandas2ri.converter):
crs = Benchmarking.dea(data['Age'], data['CO2'], RTS='crs', ORIENTATION='in')
crs['eff']
or crs['lambda']
work fine and return ndarrays
crs
____________________________________________________________________
o{'eff': [1. 0.625 0.5 ], 'lambda': [[1. 0. 0. ]
[1.25 0. 0. ]
[1.5 0. 0. ]], 'objval': [1. 0.625 0.5 ], 'RTS': [1] "crs"
, 'primal': <rpy2.rinterface_lib.sexp.NULLType object at 0x00000220BCB0D1C0> [RTYPES.NILSXP], 'dual': <rpy2.rinterface_lib.sexp.NULLType object at 0x00000220BCB0D1C0> [RTYPES.NILSXP], 'ux': <rpy2.rinterface_lib.sexp.NULLType object at 0x00000220BCB0D1C0> [RTYPES.NILSXP], 'vy': <rpy2.rinterface_lib.sexp.NULLType object at 0x00000220BCB0D1C0> [RTYPES.NILSXP], 'gamma': function (x) .Primitive("gamma")
, 'ORIENTATION': [1] "in"
, 'TRANSPOSE': [1] FALSE
, 'param': <rpy2.rinterface_lib.sexp.NULLType object at 0x00000220BCB0D1C0> [RTYPES.NILSXP], }
So far so good.
However there is more useful data that I would like to extract eg.
crs['dual']
_______________________________________________________________
<rpy2.rinterface_lib.sexp.NULLType object at 0x00000220BCB0D1C0> [RTYPES.NILSXP]
What kind of object is this? <>
Searching up RTYPES.NILSXP in the 3.5.3 docs takes me to a page in the docs which is the only mention I have found.
I have no idea how to read this. The docs explains that datasets can be serialised R objects or serialised R code that produces the dataset. rpy2 employs 'lazy loading' and to load the data, one must use the method fetch()
but I don't seem to be able to use it correctly to load the rest of the outputs from dea(x, y, *args)
Failed attempts to load data
rpy2.robjects.packages.PackageData.fetch(crs['dual'])
_______________________________________________________________
TypeError: PackageData.fetch() missing 1 required positional argument: 'name'
I've found fetch()
method belongs to PackageData. I've tried to call it but now it asks me for the 'name' of this dataset?? I thought crs['dual']
was enough information. When I pass in 'dual'
as the name parameter I get
rpy2.robjects.packages.PackageData.fetch(r_from_df_crs['dual'], 'dual')
File ~\anaconda3\envs\UROP_buildings_env\lib\site-packages\rpy2\robjects\packages.py:143, in PackageData.fetch(self, name)
136 def fetch(self, name):
137 """ Fetch the dataset (loads it or evaluates the R associated
138 with it.
139
140 In R, datasets are loaded into the global environment by default
141 but this function returns an environment that contains the dataset(s).
142 """
--> 143 if self._datasets is None:
144 self._init_setlist()
146 if name not in self._datasets:
AttributeError: 'NULLType' object has no attribute '_datasets
so I am stuck. How can I deserialise this <RTYPES.NILSXP>
object from memory?
CodePudding user response:
Add a custom converter for it:
my_converter = robjects.default_converter pandas2ri.converter
@local_rules.rpy2py.register(rinterface.NULLType)
def rpy2py_null(obj):
return None
with localconverter(my_converter):
crs = Benchmarking.dea(data['Age'], data['CO2'], RTS='crs', ORIENTATION='in')
CodePudding user response:
However there is more useful data that I would like to extract eg.
crs['dual'] _______________________________________________________________ <rpy2.rinterface_lib.sexp.NULLType object at 0x00000220BCB0D1C0> [RTYPES.NILSXP]
What kind of object is this? <>
This is R's NULL
object. This is pretty much like a NULL
in C, or a null
in Java or Javascript. With the increasing use of Python for data science, None
came to mean either the equivalent of a NULL
, or a missing value (which is an NA
or variant in R). The default conversion is returning an R NULL
rather than convert to None
to avoid confusions. This is an early decision though. If the need to revisit this an issue should be opened on the project's page (on Github).
Searching up RTYPES.NILSXP in the 3.5.3 docs takes me to a page in the docs which is the only mention I have found.
The mention on that page refers to the default value for a named argument:
https://rpy2.github.io/doc/latest/html/robjects_rpackages.html#rpy2.robjects.packages.PackageData
I see that the documentation is incomplete. lib_loc
is an optional path for the class constructor indicating where the R package is installed.
I have no idea how to read this. The docs explains that datasets can be serialised R objects or serialised R code that produces the dataset. rpy2 employs 'lazy loading' and to load the data, one must use the method fetch() but I don't seem to be able to use it correctly to load the rest of the outputs from dea(x, y, *args)
What is meant here is that "data" objects in R packages are not necessarily serialized R data structures like the R functions save()
, dump()
, or dput()
can help produce. They can also be R scripts. For example, an R package can have a data object "myrandnorm100" that is an R script data/myrandnorm100.R
in the installed package's directory and that script will be evaluated using the R function source()
(see https://rdrr.io/r/utils/data.html). That script can define an arbitrary number of variables. Note that serialized R data (for example in an .RData
file can also contain several named objects). The design choice for rpy2
was to try make things a little safer and predictable by keeping those names within a namespace. Silent name clashes can be at the root of challenging bugs in code.
(...)
rpy2.robjects.packages.PackageData.fetch(crs['dual'])
TypeError: PackageData.fetch() missing 1 required positional argument: 'name'
I've found fetch() method belongs to PackageData. I've tried to call it but now it asks me for the 'name' of this dataset??
Yes. The PackageData
object is like a namespace with as many named objects as the author of the R package wanted to include.
I thought
crs['dual']
was enough information. When I pass in 'dual' as the name parameter I get rpy2.robjects.packages.PackageData.fetch(r_from_df_crs['dual'], 'dual')File ~\anaconda3\envs\UROP_buildings_env\lib\site-packages\rpy2\robjects\packages.py:143, in PackageData.fetch(self, name) 136 def fetch(self, name): 137 """ Fetch the dataset (loads it or evaluates the R associated 138 with it. 139 140 In R, datasets are loaded into the global environment by default 141 but this function returns an environment that contains the dataset(s). 142 """ --> 143 if self._datasets is None: 144 self._init_setlist() 146 if name not in self._datasets: AttributeError: 'NULLType' object has no attribute '_datasets>
Well, this is not how DataPackage
objects can be instanciated. I'll use the R package datasets
as an example since it is part of the R standard library.
# Import the R package "datasets"
datasets = importr('datasets')
# That package only contains datasets. The Python object will look
# like it has no (useful) attributes. We can create an instance
# for the data in the package with:
datasets_data = rpackages.data(datasets)
# All dataset names are available through `datasets_data.names()`.
# We know the name of the one we want.
mtcars_env = datasets_data.fetch('mtcars')
# mtcars_env is an R "environment", wrapped as an `rpy2.robjects.Environment`
Note: I am seeing now that the doc has that information, but available throughout scattered examples rather also on the page about packages (see https://rpy2.github.io/doc/latest/html/search.html?q=data fetch).