Matching columns and finding the unmatched value-CodePudding

df <- data.frame(A = c('100, 200, 300, 100', '100, 200', '100, 200, 300'), B = c('100, 200, 300', '100, 200, 300', '100, 200'), stringsAsFactors = FALSE) I have a data frame example here

I have to do a comparison, the result of which can be found in column 'Flag.' If it's a perfect match, then Flag will be Y. Otherwise N. If the Flag is N, then I've to determine what the missing value is. Taking Row #2 as an example: 300 is missing from Column A, so I populate that under Value A.

CodePudding user response：

It's pretty messy but it will get what you want.

library(dplyr)
  df1 <- data.frame(
    A = c("100,200,300", "100,200", "100,200,400"),
    B = c("100,200,300", "100,200,300", "100,200")
  )
  isEmpty <- function(x) {
    return(length(x)==0)
  }
df1 <- df1 %>%
  mutate(A = as.character(A),
         B = as.character(B)) %>%
  mutate(Flag = ifelse(A == B, "Y", "N"))

df2 <- apply(df1, 1, function(x){
  if (x[3] == "N"){
    a <- x[1] %>% strsplit(., ",") %>% unlist %>% as.numeric()
    b <- x[2] %>% strsplit(., ",") %>% unlist %>% as.numeric()
    val_A <- ifelse(isEmpty(setdiff(a,b)), "A", setdiff(a,b))
    val_B <- ifelse(isEmpty(setdiff(b,a)), "A", setdiff(b,a))
    c(val_A,val_B)
  } else {
    c(NA,NA)
  }
}) %>% t %>% as.data.frame() %>% dplyr::rename(Value_A = V1, Value_B = V2)
cbind(df1, df2)

            A           B Flag Value_A Value_B
1 100,200,300 100,200,300    Y    <NA>    <NA>
2     100,200 100,200,300    N       A     300
3 100,200,400     100,200    N     400       A

CodePudding user response：

You can split column A and column B on comma and compare them to get the output.

library(dplyr)
library(purrr)

df %>%
  mutate(across(A:B, strsplit, ',\\s*'), 
         Flag = map2_chr(A, B, ~if(all(.x %in% .y) && all(.y %in% .x)) 'Y' else 'N'),
         ColumnA = map2_chr(A, B, ~toString(setdiff(.y, .x))), 
         ColumnB = map2_chr(A, B, ~toString(setdiff(.x, .y))))

#              A             B Flag ColumnA ColumnB
#1 100, 200, 300 100, 200, 300    Y                
#2      100, 200 100, 200, 300    N     300        
#3 100, 200, 300      100, 200    N             300

For this example, to get Flag column we can just do Flag = ifelse(A == B, 'Y', 'N') but what if you have values like 100, 200, 300 and 300, 200, 100 in that case ifelse would return 'N' and hence I have preferred to split and compare.

For the updated data where the count is important as well, we may use vsetdiff function from vecsets package.

library(vecsets)

df %>%
  mutate(across(A:B, strsplit, ',\\s*'), 
         Flag = map2_chr(A, B, ~if(all(.x %in% .y) && all(.y %in% .x)) 'Y' else 'N'),
         ColumnA = map2_chr(A, B, ~toString(vsetdiff(.y, .x))), 
         ColumnB = map2_chr(A, B, ~toString(vsetdiff(.x, .y))))