this might be quite basic but I need to compare two columns of string elements in two different datasets, then delete the entries that are duplicates so I can work with the remaining elements.
Currently, I have the following:
compare <- append(test1$TL,test2$tl)
compare <- compare[!duplicated(compare)]
But it only deletes the first copy of duplicated elements. I need it to delete both copies so I can work with only the non-duplicates. Can anyone help?
CodePudding user response:
You may try
x <- c(1,2,3)
y <- c(3,4,5)
z <- union(x,y)
z[! z %in% intersect(x,y)]
[1] 1 2 4 5
CodePudding user response:
Using %in%
or duplicated
.
x <- 1:3
y <- 3:5
c(x[!x %in% y], y[!y %in% x])
#[1] 1 2 4 5
. <- c(x, y)
.[!(duplicated(.) | duplicated(., fromLast = TRUE))]
#[1] 1 2 4 5
Benchmark:
x <- 1:3
y <- 3:5
bench::mark("UniInter" = {z <- union(x,y); z[! z %in% intersect(x,y)]},
"%in%" = c(x[!x %in% y], y[!y %in% x]),
"dupli" = {. <- c(x, y); .[!(duplicated(.) | duplicated(., fromLast = TRUE))]})
# expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time
# <bch:expr> <bch:t> <bch:t> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm>
#1 UniInter 8.15µs 10µs 78539. 0B 39.3 9995 5 127.3ms
#2 %in% 1.68µs 1.99µs 418373. 0B 41.8 9999 1 23.9ms
#3 dupli 4.35µs 5.13µs 178003. 0B 17.8 9999 1 56.2ms