Merge data frames in a list by condition in purrr-CodePudding

I have a list of data frames with the following structure:

list_example <- list(type1_a_b = data.frame(id = 1:3, a = 1:3, b = 4:6),
                     type1_c_d = data.frame(id = 1:5, c = 1:5, d = 5:9),
                     type2_e_f = data.frame(id = c(1,3,4), e = 1:3, f = 4:6),
                     type2_g_h = data.frame(id = c(2,3,4), g = 1:3, h = 5:7))

I also have a vector of data frame types:

data_types <- c("type1", "type2")

I would like to do a full join of data frames by type (using the data_types vector and id columns), preferably with purrr.

Desired output:

list(type1 = data.frame(id = 1:5,
                        a = c(1:3, NA, NA),
                        b = c(4:6, NA, NA),
                        c = 1:5,
                        d = 5:9),
     
     type2 = data.frame(id = c(1:4),
                        e = c(1, NA, 3, 4),
                        f = c(4, NA, 5, 6),
                        g = c(NA, 1:3),
                        h = c(NA, 5:7))
     )

$type1
  id  a  b c d
1  1  1  4 1 5
2  2  2  5 2 6
3  3  3  6 3 7
4  4 NA NA 4 8
5  5 NA NA 5 9

$type2
  id  e  f  g  h
1  1  1  4 NA NA
2  2 NA NA  1  5
3  3  3  5  2  6
4  4  4  6  3  7

I was able to reduce all list elements into one data frame with a solution from this post, but I would like to have the output in the list format to later work with different data types separately.

list_example %>%
  purrr::reduce(full_join, by = "id")

  id  a  b c d  e  f  g  h
1  1  1  4 1 5  1  4 NA NA
2  2  2  5 2 6 NA NA  1  5
3  3  3  6 3 7  2  5  2  6
4  4 NA NA 4 8  3  6  3  7
5  5 NA NA 5 9 NA NA NA NA

Thank you!

CodePudding user response：

We can split by the substring of names and loop over the outer list with map and reduce the inner nested list with full_join

library(dplyr)
library(stringr)
library(purrr)
list_example %>% 
   split(str_remove(names(.), "_.*")) %>% 
   map(~ reduce(.x, full_join, by = "id") %>%
       arrange(id))

-output

$type1
  id  a  b c d
1  1  1  4 1 5
2  2  2  5 2 6
3  3  3  6 3 7
4  4 NA NA 4 8
5  5 NA NA 5 9

$type2
  id  e  f  g  h
1  1  1  4 NA NA
2  2 NA NA  1  5
3  3  2  5  2  6
4  4  3  6  3  7

or using merge/Reduce in base R

lapply(split(list_example, sub("_.*", "", names(list_example))), 
       \(x) Reduce(\(...) merge(..., all = TRUE), x))

-output

$type1
  id  a  b c d
1  1  1  4 1 5
2  2  2  5 2 6
3  3  3  6 3 7
4  4 NA NA 4 8
5  5 NA NA 5 9

$type2
  id  e  f  g  h
1  1  1  4 NA NA
2  2 NA NA  1  5
3  3  2  5  2  6
4  4  3  6  3  7

CodePudding user response：

A base R option using lapply -

nm <- names(list_example)

result <- lapply(data_types, function(x) 
            Reduce(function(p, q) merge(p, q, all = TRUE, by = 'id'), 
            list_example[grep(x, nm)]))
result

#[[1]]
#  id  a  b c d
#1  1  1  4 1 5
#2  2  2  5 2 6
#3  3  3  6 3 7
#4  4 NA NA 4 8
#5  5 NA NA 5 9

#[[2]]
#  id  e  f  g  h
#1  1  1  4 NA NA
#2  2 NA NA  1  5
#3  3  2  5  2  6
#4  4  3  6  3  7

If you want to name the result list, you may add.

names(result) <- data_types