I have two datasets that look like this: The first is :
id | gear |
---|---|
A1 | A |
A2 | B |
A3 | C |
A4 | D |
A5 | E |
A6 | F |
A7 | G |
A8 | H |
A9 | I |
A10 | G |
And the second :
id | gear2 |
---|---|
A1 | A |
A4 | E |
A2 | A |
A5 | E |
A13 | B |
A3 | C |
A9 | I |
A8 | B |
A7 | G |
A20 | G |
A21 | B |
A23 | D |
A33 | E |
There two unbalanced data frames.The first data frame is the recorded data set.The second one contains what is known about the gear coming from an id.I want to check the first data frame if what is recorded actually is known or unknown.Specifically i want to check given the id code to check is the gear is the same in both data frames.But individually on each id. Ideally the result must be :
id | gear | CHECK |
---|---|---|
A1 | A | TRUE |
A2 | B | FALSE |
A3 | C | TRUE |
A4 | D | FALSE |
A5 | E | TRUE |
A6 | F | N/A |
A7 | G | TRUE |
A8 | H | FALSE |
A9 | I | TRUE |
A10 | G | N/A |
id1 = c("A1","A2","A3","A4","A5","A6","A7","A8","A9","A10")
Gear1 = c("A","B","C","D","E","F","G","H","I","G")
dat1 = tibble(id1,Gear1);dat1
id2 = c("A1","A4","A2","A5","A13","A3","A9","A8","A7","A20","A21","A23","A33")
Gear2 = c("A","E","A","E","B","C","I","B","G","G","B","D","E")
dat2 = tibble(id2,Gear2);dat2
How can i do it in R using the dplyr package ? Any help ?
CodePudding user response:
You can use a left_join
and then compare the two columns:
library(dplyr)
dat1 %>%
left_join(dat2, by = c("id1" = "id2")) %>%
mutate(CHECK = Gear1 == Gear2) %>%
select(id = id1, gear = Gear1, CHECK)
# A tibble: 10 × 3
id gear CHECK
<chr> <chr> <lgl>
1 A1 A TRUE
2 A2 B FALSE
3 A3 C TRUE
4 A4 D FALSE
5 A5 E TRUE
6 A6 F NA
7 A7 G TRUE
8 A8 H FALSE
9 A9 I TRUE
10 A10 G NA
Have a look at the dplyr
documentation how to use joins.
Edit
Thanks to Ritchie Sacramento for the trick to do the renaming directly in the select
function.
CodePudding user response:
Try this:
dat1 |>
mutate(check = ifelse(!id1 %in% dat2$id2,
NA,
ifelse(paste(id1, Gear1) %in% paste(dat2$id2, dat2$Gear2),
TRUE,
FALSE)))
CodePudding user response:
library(tidyverse)
dat1 = rename(dat1, id = 'id1')
dat2 = rename(dat2, id = 'id2')
check_data = dat1 %>%
full_join(dat2, by='id') %>%
mutate(check = ifelse(Gear1==Gear2, TRUE, FALSE)) %>%
filter(! is.na(Gear1))
Output:
check_data
# A tibble: 10 x 4
id Gear1 Gear2 check
<chr> <chr> <chr> <lgl>
1 A1 A A TRUE
2 A2 B A FALSE
3 A3 C C TRUE
4 A4 D E FALSE
5 A5 E E TRUE
6 A6 F NA NA
7 A7 G G TRUE
8 A8 H B FALSE
9 A9 I I TRUE
10 A10 G NA NA