Currently I'm trying to copy a complex nested tibble
from R to Python using the rpy2
package. Since Python does not handle nested data very well, I'm splitting my data in two parts (meta data and several time series) and convert the time series data into an 3D array within R. So far so good, but as you can see here R handles the dimensions within the array different from Python. I was hoping that rpy2
would transform the dimension by itself, but as you can see in my MWE this is not the case:
import rpy2.robjects as ro
import numpy as np
from rpy2.robjects import numpy2ri
from rpy2.robjects import default_converter
from rpy2.robjects.conversion import localconverter
ro.r(
"""
f <- function() {
data1 <- c(
1, 2, 3, 4,
5, 6, 7, 8,
9, 10, 11, 12
)
data2 <- c(
10, 20, 30, 40,
50, 60, 70, 80,
90, 100, 110, 120
)
result <- array(
c(data1, data2),
dim = c(4, 3, 2)
)
print(result)
print(dim(result))
return(result)
}
"""
)
r_f = ro.globalenv["f"]
v_np = r_f()
print(type(v_np))
print("###################################")
with localconverter(default_converter numpy2ri.converter) as cv:
np_data_measurment = ro.conversion.rpy2py(v_np)
print(np_data_measurment)
print(type(np_data_measurment))
print(np_data_measurment.shape)
print("###################################")
np_good = np.array(
[
[
[1, 5, 9],
[2, 6, 10],
[3, 7, 11],
[4, 8, 12]],
[
[10, 50, 90],
[20, 60, 100],
[30, 70, 110],
[40, 80, 120]],
]
)
print(np_good)
print(type(np_good))
print(np_good.shape)
print("###################################")
print(np_data_measurment.reshape(2, 4, 3, order='F'))
results in this: , , 1
[,1] [,2] [,3]
[1,] 1 5 9
[2,] 2 6 10
[3,] 3 7 11
[4,] 4 8 12
, , 2
[,1] [,2] [,3]
[1,] 10 50 90
[2,] 20 60 100
[3,] 30 70 110
[4,] 40 80 120
[1] 4 3 2
<class 'rpy2.robjects.vectors.FloatArray'>
###################################
[[[ 1. 10.]
[ 5. 50.]
[ 9. 90.]]
[[ 2. 20.]
[ 6. 60.]
[ 10. 100.]]
[[ 3. 30.]
[ 7. 70.]
[ 11. 110.]]
[[ 4. 40.]
[ 8. 80.]
[ 12. 120.]]]
<class 'numpy.ndarray'>
(4, 3, 2)
###################################
[[[ 1 5 9]
[ 2 6 10]
[ 3 7 11]
[ 4 8 12]]
[[ 10 50 90]
[ 20 60 100]
[ 30 70 110]
[ 40 80 120]]]
<class 'numpy.ndarray'>
(2, 4, 3)
###################################
[[[ 1. 9. 50.]
[ 3. 11. 70.]
[ 5. 10. 90.]
[ 7. 30. 110.]]
[[ 2. 10. 60.]
[ 4. 12. 80.]
[ 6. 20. 100.]
[ 8. 40. 120.]]]
(base)
Now I am looking for a way to translate my data from R to Python in a way that keeps the dimesionality of the R-array. As you can see I also included an example as to how the ordering should look like np_good
and tried to reshape the bad one (but I would prefer for a rpy2 way of reshaping the data).
Do you have any idea, maybe a custom converter, as to how one can copy 3D arrays from R to Python while keeping the dimensions intact?
CodePudding user response:
What this boils down to IMO is how R and (C-based) numpy arrays are laid out in memory: R - column first, numpy - row first.
A simple solution is to transpose your numpy array:
np_data_measurment.transpose((2,1,0))
This will give you the same display as R.
array([[[ 1., 2., 3., 4.],
[ 5., 6., 7., 8.],
[ 9., 10., 11., 12.]],
[[ 10., 20., 30., 40.],
[ 50., 60., 70., 80.],
[ 90., 100., 110., 120.]]])
As long as you are not putting this transposed array back into R, you will be fine. (You need to retranspose if you are doing so.)