Home > database >  How to merge datasets in a list by rows in R?
How to merge datasets in a list by rows in R?

Time:03-02

Similar questions have been asked here and here and here. However, none seem to help my specific situation. Im trying to merge a bunch of datasets (that are in a list) and turn it into a matrix. But Im trying to merge them by row. So, for example, if we have some data that looks like this:

set.seed(100)

dfList <- NULL
for(i in 1:3){
  dfList[[i]] <- data.frame(
    x1 = sample(1:10, 3, replace = T),
    x2 = sample(1:10, 3, replace = T)
    )
}

> dfList
[[1]]
  x1 x2
1 10  3
2  7  9
3  6 10

[[2]]
  x1 x2
1  7  4
2  6  7
3  6  6

[[3]]
  x1 x2
1  2  7
2  7  8
3  7  2

I am trying to merge the datasets by row and turn it into a matrix. What I mean is, the 1st row of my new matrix will come from the 1st row of the 1st data frame in the list. The 2nd row of my new matrix will come from the 1st row of the 2nd data frame in the list... and so on.

So, using the above example, my desired output would look like:

      x1 x2
 [1,] 10  3
 [2,]  7  4
 [3,]  2  7
 [4,]  7  9
 [5,]  6  7
 [6,]  7  8
 [7,]  6 10
 [8,]  6  6
 [9,]  7  2

Any suggestions as to how I could do this?

CodePudding user response:

Use abind like this:

library(abind)
matrix(t(abind(dfList)), ncol = 2, byrow = TRUE)

giving:

      [,1] [,2]
 [1,]   10    3
 [2,]    7    4
 [3,]    2    7
 [4,]    7    9
 [5,]    6    7
 [6,]    7    8
 [7,]    6   10
 [8,]    6    6
 [9,]    7    2

or with only base R:

matrix(t(do.call("cbind", dfList)), ncol = 2, byrow = TRUE)

CodePudding user response:

You can do:

library(tidyverse)
new_matrix <- lapply(seq_along(dfList),
                     function(x) {dfList[[x]] <- dfList[[x]] %>% mutate(id1 = 1:n(), id2 = x)}) %>%
  bind_rows() %>%
  arrange(id1, id2) %>%
  select(-id1, -id2) %>%
  as.matrix()



     x1 x2
 [1,] 10  3
 [2,]  7  4
 [3,]  2  7
 [4,]  7  9
 [5,]  6  7
 [6,]  7  8
 [7,]  6 10
 [8,]  6  6
 [9,]  7  2

CodePudding user response:

do.call(rbind, lapply(1:nrow(dfList[[1]]), function(x){
  do.call(rbind, lapply(dfList, function(y) y[x,]))
  }))

   x1 x2
1  10  3
2   7  4
3   2  7
23  7  9
21  6  7
22  7  8
33  6 10
31  6  6
32  7  2

Achieves what you want assuming all data.frames have the same number of rows.

The rules are not clear in the question for the case where the number of rows may differ.

CodePudding user response:

Two other ways to order by row number after an rbind.

set.seed(100)

dfList <- NULL
for(i in 1:3){
  dfList[[i]] <- data.frame(
    x1 = sample(1:10, 3, replace = T),
    x2 = sample(1:10, 3, replace = T)
    )
}

library(data.table)
rbindlist(dfList, idcol = 'x')[order(rowid(x)), -'x']
#>       x1    x2
#>    <int> <int>
#> 1:    10     3
#> 2:     7     4
#> 3:     2     7
#> 4:     7     9
#> 5:     6     7
#> 6:     7     8
#> 7:     6    10
#> 8:     6     6
#> 9:     7     2

Created on 2022-03-01 by the reprex package (v2.0.1)

set.seed(100)

dfList <- NULL
for(i in 1:3){
  dfList[[i]] <- data.frame(
    x1 = sample(1:10, 3, replace = T),
    x2 = sample(1:10, 3, replace = T)
    )
}

do.call(rbind, dfList)[order(unlist(lapply(dfList, function(x) seq(nrow(x))))),]
#>   x1 x2
#> 1 10  3
#> 4  7  4
#> 7  2  7
#> 2  7  9
#> 5  6  7
#> 8  7  8
#> 3  6 10
#> 6  6  6
#> 9  7  2

Created on 2022-03-01 by the reprex package (v2.0.1)

CodePudding user response:

Here's my attempt -

library(dplyr)

lapply(dfList, asplit, 1) %>% purrr::transpose() %>% bind_rows()

#    x1    x2
#  <int> <int>
#1    10     3
#2     7     4
#3     2     7
#4     7     9
#5     6     7
#6     7     8
#7     6    10
#8     6     6
#9     7     2

If you need a matrix as output you can add %>% as.matrix() to the chain.

  •  Tags:  
  • r
  • Related