Home > OS >  from 4d mat-lab array to a 2d matrix or data frame in R
from 4d mat-lab array to a 2d matrix or data frame in R

Time:04-07

I have a dataset that was originally a Matlab .mat file. When I import it to R,

one of the elements in the list that has the .mat file contents seems to be a multidimensional array (4d to be precise.

looks something like num[1:41, 1:2400,1:60, 1:6]. I know this has to do with 41 features that vary in 2400 trials, for each of 60 people is that make one out of 6 choices in every trial.

From this what I really want is just a 2d matrix or a dataframe, where I can have 41 columns for the features, one column for trial, one column for person id, and one column that stores the choice they made, in that specific trial.

So in essence each row will show the value of all 41 features, the person ID, the trial Id and their choice. Eventually, I need to be able to share this in just one file like a csv or txt.

Is there an efficient way to do this? so far my solution seems really convoluted and would take quite a couple of loops and if statements. Many thanks

CodePudding user response:

Sample data ary, assuming it somewhat represents your larger matrix.

ary <- array(seq(prod(c(4,10,3,2))), dim = c(4,10,3,2))
ary
# , , 1, 1
#      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,]    1    5    9   13   17   21   25   29   33    37
# [2,]    2    6   10   14   18   22   26   30   34    38
# [3,]    3    7   11   15   19   23   27   31   35    39
# [4,]    4    8   12   16   20   24   28   32   36    40
# , , 2, 1
#      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,]   41   45   49   53   57   61   65   69   73    77
# [2,]   42   46   50   54   58   62   66   70   74    78
# [3,]   43   47   51   55   59   63   67   71   75    79
# [4,]   44   48   52   56   60   64   68   72   76    80
# , , 3, 1
#      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,]   81   85   89   93   97  101  105  109  113   117
# [2,]   82   86   90   94   98  102  106  110  114   118
# [3,]   83   87   91   95   99  103  107  111  115   119
# [4,]   84   88   92   96  100  104  108  112  116   120
# , , 1, 2
#      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,]  121  125  129  133  137  141  145  149  153   157
# [2,]  122  126  130  134  138  142  146  150  154   158
# [3,]  123  127  131  135  139  143  147  151  155   159
# [4,]  124  128  132  136  140  144  148  152  156   160
# , , 2, 2
#      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,]  161  165  169  173  177  181  185  189  193   197
# [2,]  162  166  170  174  178  182  186  190  194   198
# [3,]  163  167  171  175  179  183  187  191  195   199
# [4,]  164  168  172  176  180  184  188  192  196   200
# , , 3, 2
#      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,]  201  205  209  213  217  221  225  229  233   237
# [2,]  202  206  210  214  218  222  226  230  234   238
# [3,]  203  207  211  215  219  223  227  231  235   239
# [4,]  204  208  212  216  220  224  228  232  236   240

A three-step process for converting it:

tmp <- apply(ary, 3:4, function(z) as.data.frame(t(z)), simplify = FALSE)
eg <- do.call(expand.grid, lapply(dim(tmp), seq))
out <- do.call(rbind, Map(function(x, d3, d4) transform(x, dim3=d3, dim4=d4), c(tmp), eg[[1]], eg[[2]]))
dim(out)
# [1] 60  6
head(out)
#   V1 V2 V3 V4 dim3 dim4
# 1  1  2  3  4    1    1
# 2  5  6  7  8    1    1
# 3  9 10 11 12    1    1
# 4 13 14 15 16    1    1
# 5 17 18 19 20    1    1
# 6 21 22 23 24    1    1

Details:

  • apply(ary, 3:4, ...) is going to apply a function on each "plane" of the 4d array. The 3:4 represents the 3rd and 4th dimension, leaving the first two dims (rows and columns, respectively) intact. In a single call of the anonymous function, z is a 2d array such as ary[,,1,1]. t transposes that so that it is 4 columns wide so that you can have your features in the column dimension.

  • eg is just a mechanism to number the 3rd and 4th dimensions so that we can record the source of each plane. This returns a frame with all combinations of 1:dim(ary)[3] against 1:dim(ary)[4], and in the same order as what we get when we do c(tmp), preserving the numbered planes. Look at head(eg) to see how it is counting.

  • Map assigns the dim3/dim4 values in eg with each of the planes.

  • do.call(rbind, ...) takes a list of data.frame and produces a single frame. It is akin to rbind(out[[1]], out[[2]], out[[3]], ...), but done in a way that is agnostic to the number of planes stored in out's elements.

  • Related