I have a list of data frames with the following structure:
list_example <- list(type1_a_b = data.frame(id = 1:3, a = 1:3, b = 4:6),
type1_c_d = data.frame(id = 1:5, c = 1:5, d = 5:9),
type2_e_f = data.frame(id = c(1,3,4), e = 1:3, f = 4:6),
type2_g_h = data.frame(id = c(2,3,4), g = 1:3, h = 5:7))
I also have a vector of data frame types:
data_types <- c("type1", "type2")
I would like to do a full join of data frames by type (using the data_types vector and id columns), preferably with purrr
.
Desired output:
list(type1 = data.frame(id = 1:5,
a = c(1:3, NA, NA),
b = c(4:6, NA, NA),
c = 1:5,
d = 5:9),
type2 = data.frame(id = c(1:4),
e = c(1, NA, 3, 4),
f = c(4, NA, 5, 6),
g = c(NA, 1:3),
h = c(NA, 5:7))
)
$type1
id a b c d
1 1 1 4 1 5
2 2 2 5 2 6
3 3 3 6 3 7
4 4 NA NA 4 8
5 5 NA NA 5 9
$type2
id e f g h
1 1 1 4 NA NA
2 2 NA NA 1 5
3 3 3 5 2 6
4 4 4 6 3 7
I was able to reduce all list elements into one data frame with a solution from this post, but I would like to have the output in the list format to later work with different data types separately.
list_example %>%
purrr::reduce(full_join, by = "id")
id a b c d e f g h
1 1 1 4 1 5 1 4 NA NA
2 2 2 5 2 6 NA NA 1 5
3 3 3 6 3 7 2 5 2 6
4 4 NA NA 4 8 3 6 3 7
5 5 NA NA 5 9 NA NA NA NA
Thank you!
CodePudding user response:
We can split
by the substring of names and loop over the outer list
with map
and reduce
the inner nested list
with full_join
library(dplyr)
library(stringr)
library(purrr)
list_example %>%
split(str_remove(names(.), "_.*")) %>%
map(~ reduce(.x, full_join, by = "id") %>%
arrange(id))
-output
$type1
id a b c d
1 1 1 4 1 5
2 2 2 5 2 6
3 3 3 6 3 7
4 4 NA NA 4 8
5 5 NA NA 5 9
$type2
id e f g h
1 1 1 4 NA NA
2 2 NA NA 1 5
3 3 2 5 2 6
4 4 3 6 3 7
or using merge/Reduce
in base R
lapply(split(list_example, sub("_.*", "", names(list_example))),
\(x) Reduce(\(...) merge(..., all = TRUE), x))
-output
$type1
id a b c d
1 1 1 4 1 5
2 2 2 5 2 6
3 3 3 6 3 7
4 4 NA NA 4 8
5 5 NA NA 5 9
$type2
id e f g h
1 1 1 4 NA NA
2 2 NA NA 1 5
3 3 2 5 2 6
4 4 3 6 3 7
CodePudding user response:
A base R option using lapply
-
nm <- names(list_example)
result <- lapply(data_types, function(x)
Reduce(function(p, q) merge(p, q, all = TRUE, by = 'id'),
list_example[grep(x, nm)]))
result
#[[1]]
# id a b c d
#1 1 1 4 1 5
#2 2 2 5 2 6
#3 3 3 6 3 7
#4 4 NA NA 4 8
#5 5 NA NA 5 9
#[[2]]
# id e f g h
#1 1 1 4 NA NA
#2 2 NA NA 1 5
#3 3 2 5 2 6
#4 4 3 6 3 7
If you want to name the result
list, you may add.
names(result) <- data_types