I have a dataset that was originally a Matlab .mat file. When I import it to R,
one of the elements in the list that has the .mat file contents seems to be a multidimensional array (4d to be precise.
looks something like num[1:41, 1:2400,1:60, 1:6]. I know this has to do with 41 features that vary in 2400 trials, for each of 60 people is that make one out of 6 choices in every trial.
From this what I really want is just a 2d matrix or a dataframe, where I can have 41 columns for the features, one column for trial, one column for person id, and one column that stores the choice they made, in that specific trial.
So in essence each row will show the value of all 41 features, the person ID, the trial Id and their choice. Eventually, I need to be able to share this in just one file like a csv or txt.
Is there an efficient way to do this? so far my solution seems really convoluted and would take quite a couple of loops and if statements. Many thanks
CodePudding user response:
Sample data ary
, assuming it somewhat represents your larger matrix.
ary <- array(seq(prod(c(4,10,3,2))), dim = c(4,10,3,2))
ary
# , , 1, 1
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,] 1 5 9 13 17 21 25 29 33 37
# [2,] 2 6 10 14 18 22 26 30 34 38
# [3,] 3 7 11 15 19 23 27 31 35 39
# [4,] 4 8 12 16 20 24 28 32 36 40
# , , 2, 1
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,] 41 45 49 53 57 61 65 69 73 77
# [2,] 42 46 50 54 58 62 66 70 74 78
# [3,] 43 47 51 55 59 63 67 71 75 79
# [4,] 44 48 52 56 60 64 68 72 76 80
# , , 3, 1
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,] 81 85 89 93 97 101 105 109 113 117
# [2,] 82 86 90 94 98 102 106 110 114 118
# [3,] 83 87 91 95 99 103 107 111 115 119
# [4,] 84 88 92 96 100 104 108 112 116 120
# , , 1, 2
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,] 121 125 129 133 137 141 145 149 153 157
# [2,] 122 126 130 134 138 142 146 150 154 158
# [3,] 123 127 131 135 139 143 147 151 155 159
# [4,] 124 128 132 136 140 144 148 152 156 160
# , , 2, 2
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,] 161 165 169 173 177 181 185 189 193 197
# [2,] 162 166 170 174 178 182 186 190 194 198
# [3,] 163 167 171 175 179 183 187 191 195 199
# [4,] 164 168 172 176 180 184 188 192 196 200
# , , 3, 2
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,] 201 205 209 213 217 221 225 229 233 237
# [2,] 202 206 210 214 218 222 226 230 234 238
# [3,] 203 207 211 215 219 223 227 231 235 239
# [4,] 204 208 212 216 220 224 228 232 236 240
A three-step process for converting it:
tmp <- apply(ary, 3:4, function(z) as.data.frame(t(z)), simplify = FALSE)
eg <- do.call(expand.grid, lapply(dim(tmp), seq))
out <- do.call(rbind, Map(function(x, d3, d4) transform(x, dim3=d3, dim4=d4), c(tmp), eg[[1]], eg[[2]]))
dim(out)
# [1] 60 6
head(out)
# V1 V2 V3 V4 dim3 dim4
# 1 1 2 3 4 1 1
# 2 5 6 7 8 1 1
# 3 9 10 11 12 1 1
# 4 13 14 15 16 1 1
# 5 17 18 19 20 1 1
# 6 21 22 23 24 1 1
Details:
apply(ary, 3:4, ...)
is going to apply a function on each "plane" of the 4d array. The3:4
represents the 3rd and 4th dimension, leaving the first two dims (rows and columns, respectively) intact. In a single call of the anonymous function,z
is a 2d array such asary[,,1,1]
.t
transposes that so that it is 4 columns wide so that you can have your features in the column dimension.eg
is just a mechanism to number the 3rd and 4th dimensions so that we can record the source of each plane. This returns a frame with all combinations of1:dim(ary)[3]
against1:dim(ary)[4]
, and in the same order as what we get when we doc(tmp)
, preserving the numbered planes. Look athead(eg)
to see how it is counting.Map
assigns the dim3/dim4 values ineg
with each of the planes.do.call(rbind, ...)
takes a list ofdata.frame
and produces a single frame. It is akin torbind(out[[1]], out[[2]], out[[3]], ...)
, but done in a way that is agnostic to the number of planes stored inout
's elements.