Home > Software design >  Check matches between two data frames in R
Check matches between two data frames in R

Time:05-13

I have two almost identical dataframes containing the same people (df_A and df_B). I would now like to check for each person how many values ​​in df_A and df_B match (e.g., Person 1 has 3 identical values in df_A and df_B, whereas Person 4 has 2 identical values).

I would like to create new variables that contain the information about the number of matching values.

df_A and df_B could look like this:

df_A <- read.table(text=
"ID    Var_1  Var_2    Var_3      Var_4      Var_5      Var_6
1       1       NA       NA       1          NA         1
2       NA      NA       NA       1          1          1
3       NA      1        1        NA         NA         1
4       1       1        NA       NA         1          NA
5       NA      NA        NA       1         1          1", header=TRUE)

df_B <- read.table(text=
"ID   Var_1  Var_2    Var_3        Var_4      Var_5      Var_6
1       1       NA       NA        1          NA          1
2       NA      NA       NA        1          1           1
3       1       NA       1         1          NA          NA
4       1       1        1         NA         NA          NA
5       1       1        1         NA         NA          NA", header=TRUE)

Ideally, the end result would look like this:

df_C <- read.table(text=
"ID    Matches
1      3    
2      3
3      1
4      2
5      0", header=TRUE)

Do you have any ideas on how achieve this most efficiently using R? I'm relatively new to R and would like to learn how to solve such problems without lengthy code. Thanks for your hints!

CodePudding user response:

Here's an idea.

library(dplyr)
library(tidyr)

left_join(df_A, df_B, by = 'ID') %>%
  pivot_longer(-ID, names_pattern = '(.*).[xy]') %>%
  group_by(ID, name) %>%
  summarise(matches = !any(is.na(value)) & n_distinct(value, na.rm = TRUE)) %>%
  summarise(matches = sum(matches))

#> # A tibble: 5 × 2
#>      ID matches
#>   <int>   <int>
#> 1     1       3
#> 2     2       3
#> 3     3       1
#> 4     4       2
#> 5     5       0
  • Related