How can i match the values of a column according to another of a data frame in R using dplyr?-CodePudding

I have two datasets that look like this: The first is :

id	gear
A1	A
A2	B
A3	C
A4	D
A5	E
A6	F
A7	G
A8	H
A9	I
A10	G

And the second :

id	gear2
A1	A
A4	E
A2	A
A5	E
A13	B
A3	C
A9	I
A8	B
A7	G
A20	G
A21	B
A23	D
A33	E

There two unbalanced data frames.The first data frame is the recorded data set.The second one contains what is known about the gear coming from an id.I want to check the first data frame if what is recorded actually is known or unknown.Specifically i want to check given the id code to check is the gear is the same in both data frames.But individually on each id. Ideally the result must be :

id	gear	CHECK
A1	A	TRUE
A2	B	FALSE
A3	C	TRUE
A4	D	FALSE
A5	E	TRUE
A6	F	N/A
A7	G	TRUE
A8	H	FALSE
A9	I	TRUE
A10	G	N/A

id1 = c("A1","A2","A3","A4","A5","A6","A7","A8","A9","A10")
Gear1 = c("A","B","C","D","E","F","G","H","I","G")
dat1 = tibble(id1,Gear1);dat1

id2 = c("A1","A4","A2","A5","A13","A3","A9","A8","A7","A20","A21","A23","A33")
Gear2 = c("A","E","A","E","B","C","I","B","G","G","B","D","E")
dat2 = tibble(id2,Gear2);dat2

How can i do it in R using the dplyr package ? Any help ?

CodePudding user response：

You can use a left_join and then compare the two columns:

library(dplyr)

dat1 %>% 
  left_join(dat2, by = c("id1" = "id2")) %>% 
  mutate(CHECK = Gear1 == Gear2) %>% 
  select(id = id1, gear = Gear1, CHECK)

# A tibble: 10 × 3
   id    gear  CHECK
   <chr> <chr> <lgl>
 1 A1    A     TRUE 
 2 A2    B     FALSE
 3 A3    C     TRUE 
 4 A4    D     FALSE
 5 A5    E     TRUE 
 6 A6    F     NA   
 7 A7    G     TRUE 
 8 A8    H     FALSE
 9 A9    I     TRUE 
10 A10   G     NA

Have a look at the dplyr documentation how to use joins.

Edit

Thanks to Ritchie Sacramento for the trick to do the renaming directly in the select function.

CodePudding user response：

Try this:

dat1 |> 
  mutate(check = ifelse(!id1 %in% dat2$id2, 
                        NA,
                        ifelse(paste(id1, Gear1) %in% paste(dat2$id2, dat2$Gear2), 
                               TRUE, 
                               FALSE)))

CodePudding user response：

library(tidyverse)

dat1 = rename(dat1, id = 'id1')
dat2 = rename(dat2, id = 'id2')

check_data = dat1 %>% 
  full_join(dat2, by='id') %>% 
  mutate(check = ifelse(Gear1==Gear2, TRUE, FALSE)) %>% 
  filter(! is.na(Gear1))

Output:

check_data
# A tibble: 10 x 4
   id    Gear1 Gear2 check
   <chr> <chr> <chr> <lgl>
 1 A1    A     A     TRUE 
 2 A2    B     A     FALSE
 3 A3    C     C     TRUE 
 4 A4    D     E     FALSE
 5 A5    E     E     TRUE 
 6 A6    F     NA    NA   
 7 A7    G     G     TRUE 
 8 A8    H     B     FALSE
 9 A9    I     I     TRUE 
10 A10   G     NA    NA