I have one dataframe A like this:
name <- c("John", "Bill", "Amy", "Bill", "Mia")
Present.ID <- c(12345, 678910, 12345, 8090100, 246810)
A <- as.data.frame(cbind(name, Present.ID))
And one dataframe B like this:
name <- c("John", "Bill", "Amy", "Bill", "Mia")
Applied.ID <- c(12345, 678910, 12345, 8090100, NA)
B <- as.data.frame(cbind(name, Applied.ID))
I want to ensure/check that if an ID exists for a certain name in dataframe A, that it also exists for that same name in dataframe B. All of the names are unique, so if a name appears twice, that means the name had two different IDs at different times, but is the same person. However, some names share an ID. In my actual dataframe, there are also many names which do not have any IDs assigned to them.
My solution is to check that the combination of Name and Present.ID columns in dataframe A are equivalent to the combination of Name and Applied.ID columns in dataframe B.
I am looking for a function that can do something like
A$check <- A$Present.ID %in% B$Applied.ID
but with the added requirement of the name column matching too.
I want the output to look like this:
|name | Applied.ID | check|
|-----|------------|------|
|John | 12345 | TRUE |
|Bill | 678910 | TRUE |
|Amy | 12345 | TRUE |
|Bill |8090100 | TRUE |
|Mia | NA | FALSE|
If anything is confusing I am happy to clarify. Thanks for any help!
CodePudding user response:
Use interaction
:
cbind(A, check = interaction(A) %in% interaction(B))
name Present.ID check
1 John 12345 TRUE
2 Bill 678910 TRUE
3 Amy 12345 TRUE
4 Bill 8090100 TRUE
5 Mia 246810 FALSE
or even:
cbind(A, check = do.call(paste, A) %in% do.call(paste, B))
name Present.ID check
1 John 12345 TRUE
2 Bill 678910 TRUE
3 Amy 12345 TRUE
4 Bill 8090100 TRUE
5 Mia 246810 FALSE
CodePudding user response:
You can use mapply
to check whether the items match, and then apply all
on each rows:
A$check <- apply(mapply(`%in%`, A, B), 1, all)
name Present.ID check
1 John 12345 TRUE
2 Bill 678910 TRUE
3 Amy 12345 TRUE
4 Bill 8090100 TRUE
5 Mia 246810 FALSE