Home > OS >  Text subsetting between vectors in R
Text subsetting between vectors in R

Time:12-18

I have two vectors with given names as follows in R:

A <- data.frame(c("Nick", "Maria", "Liam", "Oliver", "Sophia", "james", "Lucas Theo"))
B  <- data.frame(c("Liam", "Theo", "Evelyn Elsa", "James", "Harper", "Amelia"))

I want to compare the two vectors and create a vector C with the names of vector B that are not in the vector A. I want the code to ignore the capital letters, i.e. to recognise that James and james is the same and also if a name appear as part of a double name, i.e., Theo, to also recognise it as different. In the end, the result must be

C <- data.frame(c("Theo", "Evelyn Elsa", "Harper","Amelia")) 

Can someone help me?

CodePudding user response:

We may convert the vectors extract to a common case (upper), use %in% to find the elements that are present in A from B, negate (!) and subset the B based on that logical vector

C <- data.frame(col1 = B[[1]][!toupper(B[[1]]) %in% toupper(A[[1]])])

-output

C
         col1
1        Theo
2 Evelyn Elsa
3      Harper
4      Amelia
  • Related