R dplyr - combining a list of tibbles with different number of rows into a single tibble with list

After using the map function, I ended up with a list of tibbles with different number of rows. As suggested in the purr documentation (https://purrr.tidyverse.org/reference/map_dfr.html?q=map_dfr#null), I used list_cbind() to convert them into a single tibble. However, because of their different number of rows, I get an error message.

A simplified example below:

a1 <- tibble(
  name1 = c(1,2,3)
)
a2 <- tibble(
  name2 = c(1,2,3)
)
a3 <- tibble(
  name3 = c(1,2)
)
A <- list(a1, a2, a3)

list_cbind(A)

and I get the following error message:

Error in `list_cbind()`:
! Can't recycle `..1` (size 3) to match `..3` (size 2).
Run `rlang::last_error()` to see where the error occurred.`

I also tried this (size = An optional integer size to ensure that every input has the same size (i.e. number of rows)) but the same error still occurs.

list_cbind(list(a1, a2, a3), size = 2)

Any suggestions how to do it using the tidyverse (or otherwise)?

CodePudding user response：

First calculate the dataframe with multiple rows.

Next go fill the dataframes which have less than the max number of rows with NA values, in the sapply I also extended to the case that the dataframes have more than one column.

Finally, using map I unlisted the dataframes and joined them by columns. (in case they have more than one column it would be advisable to do the operation on the rows and evaluate case by case)

dimMax = max(sapply(1:length(A), function(i) nrow(A[[i]])))

B = lapply(1:length(A), function(i) rbind(A[[i]],rep(NA, ((dimMax - nrow(A[[i]])) * ncol(A[[i]])))))

purrr::map_dfc(B,unlist)

CodePudding user response：

A bit long, but it works

mget(ls(pattern = "a")) %>% 
  map_dfr(~ .x %>% 
        mutate(row = 1:nrow(.))) %>% 
  pivot_longer(-row) %>% 
  drop_na() %>% 
  pivot_wider(names_from = name, values_from = value) 


# A tibble: 3 × 4
    row name1 name2 name3
  <int> <dbl> <dbl> <dbl>
1     1     1     1     1
2     2     2     2     2
3     3     3     3    NA

CodePudding user response：

It requires all the datasets to have the same number of rows. We may use cbind.na from qPCR

do.call(qpcR:::cbind.na, A)
  name1 name2 name3
1     1     1     1
2     2     2     2
3     3     3    NA

If we want to use list_cbind, get the max number of rows and use that info to expand the data to include NA rows so that it is balanced and then use list_cbind

library(purrr)
library(dplyr)
mx <- max(map_int(A, nrow))
A %>% 
  map(~ .x[seq_len(mx),]) %>%
   list_cbind
# A tibble: 3 × 3
  name1 name2 name3
  <dbl> <dbl> <dbl>
1     1     1     1
2     2     2     2
3     3     3    NA