Home > database >  find the union of a column between matched rows of data frames of a list in r
find the union of a column between matched rows of data frames of a list in r

Time:01-01

I have a list of data frames. I would like to take the union of column value in all data frames when column Names between the data frames match.

here is a toy data.

df1 <- data.frame(group = c("G1", "G1", "G1", "G1", "G1", "G2", "G2", "G2", "G1", "G1"), 
              Name = c("B", "B","B", "A", "A",'D',"D" , "E", "C", "C"), value = c(2,4,5,2,4,7, 1, 2,4,1))
df2 <- data.frame(group = c("G1", "G1", "G1", "G1", "G2", "G2", "G2", "G2" , "G1", "G1"), 
              Name = c("B", "B" , "A", "A", "D", "E", "E", "E", "C", "C"), value = c(2, 3, 5, 1, 7, 2, 4, 8, 9,1))
df <- rbind(df1, df2)

df.list <- split(df, f=df$group)

the desire output is as follows:

  B = 2,3,4,5
  A = 1,2,4,5
  D = 1,7
  E = 2,4,8
  C = 1,4,9

CodePudding user response:

I will use the tidyverse to solve the problem and assume that the desired output is a list of vectors. In the solution, I ensure that only the Name's that are common between df1 and df2 are kept.

library(tidyverse)
bind_rows(df1, df2) %>% 
  filter(Name %in% df1$Name, Name %in% df2$Name) %>%
  split(.$Name) %>% 
  map(~ sort(unique(.x$value)))

Output:

$A
[1] 1 2 4 5

$B
[1] 2 3 4 5

$C
[1] 1 4 9

$D
[1] 1 7

$E
[1] 2 4 8

If there are more than two data frames, you could put them all in a list and use this solution which works on any number of data frames.

dfs = list(df1, df2)
# First identify the common names within the data frames
common_names = dfs %>%
  map(`[[`, "Name") %>%
  reduce(intersect)
common_names
#> [1] "B" "A" "D" "E" "C"

# Now we can do the same thing as earlier
dfs %>%
  reduce(bind_rows) %>%
  filter(Name %in% common_names) %>%
  split(.$Name) %>%
  map(~ sort(unique(.x$value)))
  • Related