Home > Back-end >  How can I check the values of two columns of 2 data frames for discrepancies grouped by a variable i
How can I check the values of two columns of 2 data frames for discrepancies grouped by a variable i

Time:09-03

I have a data frame A, say :

id var
A phone
B car
C bike
D chair
E mouse
Z NA

now I have another data frame B, that contain all the registered products by the id:

id var2
A phone
A car
B NA
C bike
C phone
D chair
D phone
D car
E chair
E phone
E car
Z car
H bike

I want to check if the values declared in the data frame A exist in the registration data frame B by id.Actually to mutate a column that will check the condition by id and will return TRUE or FALSE accordingly.

Ideally the resulted output must de a data frame in R that will look like this :

id var CHECK
A phone TRUE
B car FALSE
C bike TRUE
D chair TRUE
E mouse FALSE
Z NA FALSE

the data in R are the following:

library(dplyr)
id = c("A","B","C","D","E","Z")
var = c("phone","car","bike","chair","mouse",NA);var
A = tibble(id,var);A
id = c(rep("A",2),"B",rep("C",2),rep("D",3),rep("E",3),"Z","H")
var2 = c("phone","car",NA,"bike","phone","chair","phone","car","chair","phone","car","car","bike")
B = tibble(id,var2);B

Any help?

CodePudding user response:

You can join the two dataframes and check if all the values exist in another.

library(dplyr)

A %>%
  left_join(B, by = 'id') %>%
  group_by(id, var) %>%
  summarise(CHECK = all(var %in% var2), .groups = "drop")

#   id    var   CHECK
#  <chr> <chr> <lgl>
#1 A     phone TRUE 
#2 B     car   FALSE
#3 C     bike  TRUE 
#4 D     chair TRUE 
#5 E     mouse FALSE
#6 Z     NA    FALSE

CodePudding user response:

Workaround using left_join and create "CHECK" using a comparison per group_by to get TRUE/FALSE where the first row is the right row and replace the NA with FALSE like this:

library(tibble)
id = c("A","B","C","D","E","Z")
var = c("phone","car","bike","chair","mouse",NA);var
id = c(rep("A",2),"B",rep("C",2),rep("D",3),rep("E",3),"Z")
var2 = c("phone","car",NA,"bike","phone","chair","phone","car","chair","phone","car","car")
B = tibble(id,var2);B

library(dplyr)
A %>%
  left_join(., B, by = "id") %>%
  group_by(id) %>%
  mutate(CHECK = var == var2,
         CHECK = replace(CHECK, is.na(CHECK), FALSE)) %>%
  filter(row_number() == 1) %>%
  ungroup() %>%
  select(-var2)
#> # A tibble: 6 × 3
#>   id    var   CHECK
#>   <chr> <chr> <lgl>
#> 1 A     phone TRUE 
#> 2 B     car   FALSE
#> 3 C     bike  TRUE 
#> 4 D     chair TRUE 
#> 5 E     mouse FALSE
#> 6 Z     <NA>  FALSE

Created on 2022-09-03 with reprex v2.0.2

  • Related