Home > Blockchain >  Merge dataframes stored in two lists of the same length
Merge dataframes stored in two lists of the same length

Time:09-22

I have two long lists of large dataframes that are equal in length. I want to merge Dataframe1 (from list1) with Dataframe1 (from list2) and Dataframe2 (from list1) with Dataframe2 (from list2) etc...

Below is a minimal reproducible example and some attempts.

#### EXAMPLE
#Create Dataframes
df_1 <- data.frame(c("Bah",NA,2,3,4),c("Bug",NA,5,6,NA))
df_2 <- data.frame(c("Blu",7,8,9,10),c(NA,NA,NA,12,13))
df_3 <- data.frame(c("Bah",NA,21,32,43),c("Rgh",NA,51,63,NA))
df_4 <- data.frame(c("Gar",7,8,9,10),c("Ghh",NA,NA,121,131))

#Create Lists
list1 <- list(df_1,df_2)
list2 <- list(df_3,df_4)

#Set column and row names for each dataframe
colnames(list1[[1]]) <-  c("SampleID","Measure1","Measure2","Measure3","Measure4")
colnames(list1[[2]]) <-  c("SampleID","Measure1","Measure2","Measure3","Measure4")
colnames(list2[[1]]) <-  c("SampleID","Measure1","Measure2","Measure3","Measure4")
colnames(list2[[2]]) <-  c("SampleID","Measure1","Measure2","Measure3","Measure4")

rownames(list1[[1]]) <-  c("1","2")
rownames(list1[[2]]) <-  c("1","2")
rownames(list2[[1]]) <-  c("1","2")
rownames(list2[[2]]) <-  c("1","2")

My desired output is a list of the same length as the input lists but with each dataframe merged by position into a single dataframe. The following yields my desired output for the dataframes and list but is low throughput.

#### DESIRED OUTPUT
DesiredOutput_DF1_Format <- merge(list1[[1]],list2[[1]], all = TRUE, by = "SampleID")
DesiredOutput_DF2_Format <- merge(list1[[2]],list2[[2]], all = TRUE, by = "SampleID")
DesiredOutput_List <- list(DesiredOutput_DF1_Format, DesiredOutput_DF2_Format)

How can I generate an output list in my desired format in a highthroughput way using an apply-like approach?

#### ATTEMPTS
#Attempt1:
attempt1 <- mapply(cbind, list1, list2, simplify=FALSE)

#Attempt2: 
My instinct is to use `lapply` but i cant figure how to make it iterate through two lists simultaneously.

#Attempt3: Works but the order of the output list appears inverted. This is not intuitive, though it is easily corrected... There has to be a cleaner way.
output_list <- list()
dataset_iterator <- 1:length(list1)

for (x in dataset_iterator) {
    df1 <- data.frame(list1[[x]])
    df2 <- data.frame(list2[[x]])
    df_merged <- data.frame(merge(df1, df2, by = "Barcodes", all=TRUE))
    output_list <- append(output_list, list(df_merged), 0)

CodePudding user response:

Based on the code showed, we may need Map (or mapply with SIMPLIFY = FALSE)

out <- Map(merge, list1, list2, MoreArgs = list(all = TRUE, by = "SampleID"))

-checking with expected output

> identical(DesiredOutput_List, out)
[1] TRUE

Or using tidyverse

library(purrr)
library(dplyr)
map2(list1, list2, full_join, by = "SampleID")
  •  Tags:  
  • r
  • Related