I have a data frame with 4 columns and for each row, I want to extract 2 of the 4 columns (but for each row, it's going to be different columns).
repro = structure(list(c1 = c(0L, 0L, 1L, 1L, 0L, 1L), c2 = c(1L, 1L,
0L, 0L, 1L, 1L), c1 = c(0L, 1L, 1L, 0L, 1L, 0L), c2 = c(0L, 1L,
1L, 1L, 1L, 0L)), row.names = c(86L, 59L, 58L, 79L, 70L, 83L),
class = "data.frame")
head(repro)
c1 c2 c1 c2
86 0 1 0 0
59 0 1 1 1
58 1 0 1 1
79 1 0 0 1
70 0 1 1 1
83 1 1 0 0
Vectors of columns to select in the repro
data frame
col.sel1 = c(2, 1, 2, 2, 2, 2)
col.sel2 = c(4, 3, 3, 4, 3, 3)
For loop to select the columns (it works, but for my original data, it takes for ever as there are thousands of lines...).
# Make offspring table
offspring = NULL
for (i in 1:nrow(repro)) {
offs = cbind(c3 = repro[i,col.sel1[i]],
c4 = repro[i,col.sel2[i]])
offspring = rbind(offspring,offs)
}
head(offspring)
Giving
c3 c4
[1,] 1 0
[2,] 0 1
[3,] 0 1
[4,] 0 1
[5,] 1 1
[6,] 1 0
Is there a faster way to select different columns for each rows based on the 2 vectors
col.sel1
and col.sel2
?
I've tried:
rp[1:6, cs1]
lapply(cs1, function(x) rp[,x])
But both don't give this expected result.
CodePudding user response:
You can [
-index frames/matrices with a matrix:
cbind(
c3 = repro[cbind(seq_along(col.sel1), col.sel1)],
c4 = repro[cbind(seq_along(col.sel2), col.sel2)]
)
# c3 c4
# [1,] 1 0
# [2,] 0 1
# [3,] 0 1
# [4,] 0 1
# [5,] 1 1
# [6,] 1 0
Diving in, we see
cbind(seq_along(col.sel1), col.sel1)
# col.sel1
# [1,] 1 2
# [2,] 2 1
# [3,] 3 2
# [4,] 4 2
# [5,] 5 2
# [6,] 6 2
Which means that the first value we want is row 1 column 2; then row 2, column 1; etc. The resulting values (for the first set) are:
repro[cbind(seq_along(col.sel1), col.sel1)]
# [1] 1 0 0 0 1 1
We can then combine those with cbind
(into a matrix ... easily converted to a frame by replacing cbind
with data.frame
).
If you have an arbitrary set of these vectors, you can automate this to be "0 or more" with:
L <- list(c3=col.sel1, c4=col.sel2)
data.frame(lapply(L, function(z) repro[cbind(seq_along(z), z)]))
# c3 c4
# 1 1 0
# 2 0 1
# 3 0 1
# 4 0 1
# 5 1 1
# 6 1 0
Side note: you used 1:nrow(repro)
, but it is safer to use seq_along(col.sel1)
instead: this allows for selection of values in a length different than the number of rows. I recognize that in this use case you are likely intending exactly and always one per row, but .. it's still a safer alternative. (Since repro[cbind(1:3, 1:4)]
will not work correctly due to the unequal lengths of the vectors.)