df <- data.frame(A = c('100, 200, 300, 100', '100, 200', '100, 200, 300'), B = c('100, 200, 300', '100, 200, 300', '100, 200'), stringsAsFactors = FALSE)
I have a data frame example here
I have to do a comparison, the result of which can be found in column 'Flag.' If it's a perfect match, then Flag will be Y. Otherwise N. If the Flag is N, then I've to determine what the missing value is. Taking Row #2 as an example: 300 is missing from Column A, so I populate that under Value A.
CodePudding user response:
It's pretty messy but it will get what you want.
library(dplyr)
df1 <- data.frame(
A = c("100,200,300", "100,200", "100,200,400"),
B = c("100,200,300", "100,200,300", "100,200")
)
isEmpty <- function(x) {
return(length(x)==0)
}
df1 <- df1 %>%
mutate(A = as.character(A),
B = as.character(B)) %>%
mutate(Flag = ifelse(A == B, "Y", "N"))
df2 <- apply(df1, 1, function(x){
if (x[3] == "N"){
a <- x[1] %>% strsplit(., ",") %>% unlist %>% as.numeric()
b <- x[2] %>% strsplit(., ",") %>% unlist %>% as.numeric()
val_A <- ifelse(isEmpty(setdiff(a,b)), "A", setdiff(a,b))
val_B <- ifelse(isEmpty(setdiff(b,a)), "A", setdiff(b,a))
c(val_A,val_B)
} else {
c(NA,NA)
}
}) %>% t %>% as.data.frame() %>% dplyr::rename(Value_A = V1, Value_B = V2)
cbind(df1, df2)
A B Flag Value_A Value_B
1 100,200,300 100,200,300 Y <NA> <NA>
2 100,200 100,200,300 N A 300
3 100,200,400 100,200 N 400 A
CodePudding user response:
You can split column A and column B on comma and compare them to get the output.
library(dplyr)
library(purrr)
df %>%
mutate(across(A:B, strsplit, ',\\s*'),
Flag = map2_chr(A, B, ~if(all(.x %in% .y) && all(.y %in% .x)) 'Y' else 'N'),
ColumnA = map2_chr(A, B, ~toString(setdiff(.y, .x))),
ColumnB = map2_chr(A, B, ~toString(setdiff(.x, .y))))
# A B Flag ColumnA ColumnB
#1 100, 200, 300 100, 200, 300 Y
#2 100, 200 100, 200, 300 N 300
#3 100, 200, 300 100, 200 N 300
For this example, to get Flag
column we can just do Flag = ifelse(A == B, 'Y', 'N')
but what if you have values like 100, 200, 300
and 300, 200, 100
in that case ifelse
would return 'N'
and hence I have preferred to split and compare.
For the updated data where the count is important as well, we may use vsetdiff
function from vecsets
package.
library(vecsets)
df %>%
mutate(across(A:B, strsplit, ',\\s*'),
Flag = map2_chr(A, B, ~if(all(.x %in% .y) && all(.y %in% .x)) 'Y' else 'N'),
ColumnA = map2_chr(A, B, ~toString(vsetdiff(.y, .x))),
ColumnB = map2_chr(A, B, ~toString(vsetdiff(.x, .y))))