I have a data frame A, say :
id | var |
---|---|
A | phone |
B | car |
C | bike |
D | chair |
E | mouse |
Z | NA |
now I have another data frame B, that contain all the registered products by the id:
id | var2 |
---|---|
A | phone |
A | car |
B | NA |
C | bike |
C | phone |
D | chair |
D | phone |
D | car |
E | chair |
E | phone |
E | car |
Z | car |
H | bike |
I want to check if the values declared in the data frame A exist in the registration data frame B by id.Actually to mutate a column that will check the condition by id and will return TRUE or FALSE accordingly.
Ideally the resulted output must de a data frame in R that will look like this :
id | var | CHECK |
---|---|---|
A | phone | TRUE |
B | car | FALSE |
C | bike | TRUE |
D | chair | TRUE |
E | mouse | FALSE |
Z | NA | FALSE |
the data in R are the following:
library(dplyr)
id = c("A","B","C","D","E","Z")
var = c("phone","car","bike","chair","mouse",NA);var
A = tibble(id,var);A
id = c(rep("A",2),"B",rep("C",2),rep("D",3),rep("E",3),"Z","H")
var2 = c("phone","car",NA,"bike","phone","chair","phone","car","chair","phone","car","car","bike")
B = tibble(id,var2);B
Any help?
CodePudding user response:
You can join the two dataframes and check if all
the values exist in another.
library(dplyr)
A %>%
left_join(B, by = 'id') %>%
group_by(id, var) %>%
summarise(CHECK = all(var %in% var2), .groups = "drop")
# id var CHECK
# <chr> <chr> <lgl>
#1 A phone TRUE
#2 B car FALSE
#3 C bike TRUE
#4 D chair TRUE
#5 E mouse FALSE
#6 Z NA FALSE
CodePudding user response:
Workaround using left_join
and create "CHECK" using a comparison per group_by
to get TRUE/FALSE where the first row is the right row and replace the NA with FALSE like this:
library(tibble)
id = c("A","B","C","D","E","Z")
var = c("phone","car","bike","chair","mouse",NA);var
id = c(rep("A",2),"B",rep("C",2),rep("D",3),rep("E",3),"Z")
var2 = c("phone","car",NA,"bike","phone","chair","phone","car","chair","phone","car","car")
B = tibble(id,var2);B
library(dplyr)
A %>%
left_join(., B, by = "id") %>%
group_by(id) %>%
mutate(CHECK = var == var2,
CHECK = replace(CHECK, is.na(CHECK), FALSE)) %>%
filter(row_number() == 1) %>%
ungroup() %>%
select(-var2)
#> # A tibble: 6 × 3
#> id var CHECK
#> <chr> <chr> <lgl>
#> 1 A phone TRUE
#> 2 B car FALSE
#> 3 C bike TRUE
#> 4 D chair TRUE
#> 5 E mouse FALSE
#> 6 Z <NA> FALSE
Created on 2022-09-03 with reprex v2.0.2