There are 2 datasets
s=structure(list(var1 = c("a", "f", "k", "tt", "ee"), var2 = c("b",
"g", "l", "qq", "rr"), var3 = c("c", "h", "m", "ff", "cc"), var4 = c("d",
"i", "n", "gg", "vv"), var5 = c("e", "j", "o", "aa", "xx"), metric_var = c(100L,
200L, 300L, 567L, 789L)), class = "data.frame", row.names = c(NA,
-5L))
and
medagger=structure(list(var1 = c("a", "z", "w", "f", "k"), var2 = c("b",
"u", "e", "g", "l"), var3 = c("c", "p", "r", "h", "m"), var4 = c("d",
"q", "q", "i", "n"), var5 = c("e", "n", "w", "j", "o"), metric_var = c(100L,
200L, 400L, 500L, 700L)), class = "data.frame", row.names = c(NA,
-5L))
var1-var5 are categorical variables, their values are categories.
if we do an inner join s
and a medagger
we get this result
merge(s,medagger,by=c("var1","var2","var3","var4","var5"))
var1 var2 var3 var4 var5 metric_var.x metric_var.y
1 a b c d e 100 100
2 f g h i j 200 500
3 k l m n o 300 700
only 3 rows to return, because in these datasets there are the same categories.
However, I need the categories that there are in the s
dataset but not in the medagger
dataset
were put in separate dataframes.
In this case, I need the dataframe new
to appear, which will contain only 2 of these rows from s
.
var1 var2 var3 var4 var5 metric_var
tt qq ff gg aa 567
ee rr cc vv xx 789
How to do that the categories that were in s
but not in the medagger
appear in the new
dataframe ?
Thank you.
CodePudding user response:
You need anti_join
from dplyr
library to get the expected result. According to the documentation:
anti_join() return all rows from x without a match in y.
library(dplyr)
df <- s %>%
anti_join(medagger, by = c("var1","var2","var3","var4","var5"))
The output will be like this:
var1 var2 var3 var4 var5 metric_var
1 tt qq ff gg aa 567
2 ee rr cc vv xx 789
CodePudding user response:
library(dplyr)
s %>% inner_join(medagger, by=c("var1","var2","var3","var4","var5")) -> result1
var1 var2 var3 var4 var5 metric_var.x metric_var.y
1 a b c d e 100 100
2 f g h i j 200 500
3 k l m n o 300 700
and the second dataframe:
s %>% anti_join(medagger, by=c("var1","var2","var3","var4","var5")) -> result2
var1 var2 var3 var4 var5 metric_var
1 tt qq ff gg aa 567
2 ee rr cc vv xx 789
In base R, this works assuming that medagger and s do not have NA's:
merge(s,medagger,by=c("var1","var2","var3","var4","var5"), all.x=T) -> result
result2 <- result[!complete.cases(result),]
CodePudding user response:
You could rbind
each row and look if it's not duplicated
.
s[apply(s, 1, \(x) tail(!duplicated(rbind(medagger[1:5], x[1:5])), 1)), ]
# var1 var2 var3 var4 var5 metric_var
# 4 tt qq ff gg aa 567
# 5 ee rr cc vv xx 789