Home > Net >  How can I check that two columns in one dataframe both match two columns in another dataframe?
How can I check that two columns in one dataframe both match two columns in another dataframe?

Time:06-07

I have one dataframe A like this:

name <- c("John", "Bill", "Amy", "Bill", "Mia") 
Present.ID <- c(12345, 678910, 12345, 8090100, 246810) 
A <- as.data.frame(cbind(name, Present.ID))

And one dataframe B like this:

name <- c("John", "Bill", "Amy", "Bill", "Mia") 
Applied.ID <- c(12345, 678910, 12345, 8090100, NA) 
B <- as.data.frame(cbind(name, Applied.ID))

I want to ensure/check that if an ID exists for a certain name in dataframe A, that it also exists for that same name in dataframe B. All of the names are unique, so if a name appears twice, that means the name had two different IDs at different times, but is the same person. However, some names share an ID. In my actual dataframe, there are also many names which do not have any IDs assigned to them.

My solution is to check that the combination of Name and Present.ID columns in dataframe A are equivalent to the combination of Name and Applied.ID columns in dataframe B.

I am looking for a function that can do something like

A$check <- A$Present.ID %in% B$Applied.ID 

but with the added requirement of the name column matching too.

I want the output to look like this:

  |name | Applied.ID | check|
  |-----|------------|------|
  |John | 12345      | TRUE |
  |Bill | 678910     | TRUE |
  |Amy  | 12345      | TRUE |
  |Bill |8090100     | TRUE |
  |Mia  | NA         | FALSE|

If anything is confusing I am happy to clarify. Thanks for any help!

CodePudding user response:

Use interaction:

cbind(A, check = interaction(A) %in% interaction(B))

  name Present.ID check
1 John      12345  TRUE
2 Bill     678910  TRUE
3  Amy      12345  TRUE
4 Bill    8090100  TRUE
5  Mia     246810 FALSE

or even:

cbind(A, check = do.call(paste, A) %in% do.call(paste, B))


  name Present.ID check
1 John      12345  TRUE
2 Bill     678910  TRUE
3  Amy      12345  TRUE
4 Bill    8090100  TRUE
5  Mia     246810 FALSE

CodePudding user response:

You can use mapply to check whether the items match, and then apply all on each rows:

A$check <- apply(mapply(`%in%`, A, B), 1, all)

  name Present.ID check
1 John      12345  TRUE
2 Bill     678910  TRUE
3  Amy      12345  TRUE
4 Bill    8090100  TRUE
5  Mia     246810 FALSE
  • Related