I have a list of data frames. I would like to take the union of column value in all data frames when column Names between the data frames match.
here is a toy data.
df1 <- data.frame(group = c("G1", "G1", "G1", "G1", "G1", "G2", "G2", "G2", "G1", "G1"),
Name = c("B", "B","B", "A", "A",'D',"D" , "E", "C", "C"), value = c(2,4,5,2,4,7, 1, 2,4,1))
df2 <- data.frame(group = c("G1", "G1", "G1", "G1", "G2", "G2", "G2", "G2" , "G1", "G1"),
Name = c("B", "B" , "A", "A", "D", "E", "E", "E", "C", "C"), value = c(2, 3, 5, 1, 7, 2, 4, 8, 9,1))
df <- rbind(df1, df2)
df.list <- split(df, f=df$group)
the desire output is as follows:
B = 2,3,4,5
A = 1,2,4,5
D = 1,7
E = 2,4,8
C = 1,4,9
CodePudding user response:
I will use the tidyverse to solve the problem and assume that the desired output is a list of vectors. In the solution, I ensure that only the Name
's that are common between df1
and df2
are kept.
library(tidyverse)
bind_rows(df1, df2) %>%
filter(Name %in% df1$Name, Name %in% df2$Name) %>%
split(.$Name) %>%
map(~ sort(unique(.x$value)))
Output:
$A
[1] 1 2 4 5
$B
[1] 2 3 4 5
$C
[1] 1 4 9
$D
[1] 1 7
$E
[1] 2 4 8
If there are more than two data frames, you could put them all in a list and use this solution which works on any number of data frames.
dfs = list(df1, df2)
# First identify the common names within the data frames
common_names = dfs %>%
map(`[[`, "Name") %>%
reduce(intersect)
common_names
#> [1] "B" "A" "D" "E" "C"
# Now we can do the same thing as earlier
dfs %>%
reduce(bind_rows) %>%
filter(Name %in% common_names) %>%
split(.$Name) %>%
map(~ sort(unique(.x$value)))