In DATA
below, I was wondering how to find the unique study_id
in which variable var1
and var2
don't form a unique combination?
Obviously, study_id
that consist of a single row (e.g.,Bagheri
) always form unique and are to be ignored.
But how could we find unique study_id
with multiple rows that don't form a unique combination?
My expected answer is:
> [1]"Alavinia" (because it has multiple rows but keeps repeating the combination `var1 == ext` & `var2 == write`)
> [2] "Busch" (because it has multiple rows but repeats the combination `var1 == ext` & `var2 == speak`)
m="
study_id var1 var2
Alavinia ext write
Alavinia ext write
Alavinia ext write
Bagheri ext read
Bartz ext speak
Bartz ext listen
Bartz ext write
Bartz ext read
Busch ext gra
Busch ext read
Busch ext others
Busch ext speak
Busch ext speak
"
DATA <- read.table(text = m, h=T)
CodePudding user response:
Base R method:
subset(data.frame(table(DATA)), Freq > 1)[,'study_id']
[1] Busch Alavinia
or even:
sub(" .*", '', names(which(table(do.call(paste, DATA))>1)))
[1] "Alavinia" "Busch"
CodePudding user response:
A possible solution, based on dplyr
:
library(dplyr)
DATA %>%
add_count(study_id, var1, var2) %>%
distinct %>%
filter(n > 1) %>%
pull(study_id)
#> [1] "Alavinia" "Busch"
Or using a data.table
approach:
library(data.table)
setDT(DATA)[, study_id[.N > 1], .(study_id, var1, var2)]$V1
#> [1] "Alavinia" "Busch"