How to cbind a list of tables by one column, and suffix headings with the list item name-CodePudding

I've got a list of dataframes. I'd like to cbind them by the index column, sample_id. Each table has the same column headings, so I can't just cbind them otherwise I won't know which list item the columns came from. The name of the list item gives the measure used to generate them, so I'd like to suffix the column headings with the list item name.

Here's a simplified demo list of dataframes:

list_of_tables <- list(number = structure(list(sample_id = structure(1:3, levels = c("CSF_1", 
"CSF_2", "CSF_4"), class = "factor"), total = c(655, 331, 271
), max = c(12, 5, 7)), row.names = c(NA, -3L), class = c("tbl_df", 
"tbl", "data.frame")), concentration_cm_3 = structure(list(sample_id = structure(1:3, levels = c("CSF_1", 
"CSF_2", "CSF_4"), class = "factor"), total = c(121454697, 90959097, 
43080697), max = c(2050000, 2140000, 915500)), row.names = c(NA, 
-3L), class = c("tbl_df", "tbl", "data.frame")), volume_nm_3 = structure(list(
    sample_id = structure(1:3, levels = c("CSF_1", "CSF_2", "CSF_4"
    ), class = "factor"), total = c(2412783009, 1293649395, 438426087
    ), max = c(103500000, 117400000, 23920000)), row.names = c(NA, 
-3L), class = c("tbl_df", "tbl", "data.frame")), area_nm_2 = structure(list(
    sample_id = structure(1:3, levels = c("CSF_1", "CSF_2", "CSF_4"
    ), class = "factor"), total = c(15259297.4, 7655352.2, 3775922
    ), max = c(266500, 289900, 100400)), row.names = c(NA, -3L
), class = c("tbl_df", "tbl", "data.frame")))

You'll see it's a list of 4 tables, and the list item names are "number", "concentration_cm_3", "volume_nm_3", and "area_nm_2".

Using join_all from plyr I can merge them all by sample_id. However, how do I suffix with the list item name?

merged_tables <- plyr::join_all(stats_by_measure, by = "sample_id", type = "left")

CodePudding user response：

we could do it this way: The trick is to use .id = 'id' in bind_rows which adds the name as a column. Then we could pivot:

library(dplyr)
library(tidyr)

bind_rows(list_of_tables, .id = 'id') %>% 
  pivot_wider(names_from = id,
              values_from = c(total, max))

  sample_id total_number total_concentration_cm_3 total_volume_nm_3 total_area_nm_2 max_number max_concentration_cm_3 max_volume_nm_3 max_area_nm_2
  <fct>            <dbl>                    <dbl>             <dbl>           <dbl>      <dbl>                  <dbl>           <dbl>         <dbl>
1 CSF_1              655                121454697        2412783009       15259297.         12                2050000       103500000        266500
2 CSF_2              331                 90959097        1293649395        7655352.          5                2140000       117400000        289900
3 CSF_4              271                 43080697         438426087        3775922           7                 915500        23920000        100400

CodePudding user response：

Probably, we may use reduce2 here with suffix option from left_join

library(dplyr)
library(purrr)
nm <- names(list_of_tables)[1]
reduce2(list_of_tables, names(list_of_tables)[-1], 
   function(x, y, z) left_join(x, y, by = 'sample_id', suffix = c(nm, z)))

Or if we want to use join_all, probably we can rename the columns before doing the join

library(stringr)
imap(list_of_tables, ~ {
   nm <- .y
   .x %>% rename_with(~str_c(.x, nm), -1)
  }) %>%
   plyr::join_all( by = "sample_id", type = "left")

Or use a for loop

tmp <- list_of_tables[[1]]
names(tmp)[-1] <- paste0(names(tmp)[-1], names(list_of_tables)[1])
for(nm in names(list_of_tables)[-1]) {
   tmp2 <- list_of_tables[[nm]]
   names(tmp2)[-1] <- paste0(names(tmp2)[-1], nm)
   tmp <- left_join(tmp, tmp2, by = "sample_id")
}
tmp