In DATA
below, I was wondering how to find the unique study_id
for which variable scale
takes on more than one unique value?
The expected answer should be Li
(scale
for Li
has other & MBTI). But I wonder how to find it via BASE or dplyr
code?
m="
study_id year es_id r se n pub_type context ed_setting age_grp L1 L2 prof scale outcome
Dreyer 1992 130 0 0.0574 305 DocDisse~ Foreign~ CollegeUni~ Adult Afri~ Engl~ NA Other Listen~
Dreyer 1992 131 0.04 0.0574 305 DocDisse~ Foreign~ CollegeUni~ Adult Afri~ Engl~ NA Other Writing
Dreyer 1992 132 -0.03 0.0574 305 DocDisse~ Foreign~ CollegeUni~ Adult Afri~ Engl~ NA Other Reading
Dreyer 1992 133 0 0.0574 305 DocDisse~ Foreign~ CollegeUni~ Adult Afri~ Engl~ NA Other Overall
Ghapanchi 2011 89 0.31 0.0806 141 JournalA~ Foreign~ CollegeUni~ Adult Pers~ Engl~ NA Other Overall
Hassan 2001 177 0.25 0.117 71 NA Foreign~ CollegeUni~ NA Arab~ Engl~ NA Other Speaki~
Kralova 2012 137 0.0252 0.117 75 JournalA~ Foreign~ CollegeUni~ Adult Slov~ Engl~ Inte~ Other Speaki~
Li 2009 55 -0.04 0.132 59 JournalA~ Foreign~ CollegeUni~ Adult Chin~ Engl~ NA Other Grammar
Li 2009 56 0.355 0.124 59 JournalA~ Foreign~ CollegeUni~ Adult Chin~ Engl~ NA Other Pragma~
Li 2003 57 0.039 0.0735 187 JournalA~ Foreign~ CollegeUni~ Multip~ Chin~ Engl~ NA MBTI Overall
"
DATA <- read.table(text = m, h=T)
CodePudding user response:
Here's a way in dplyr
as well as base R -
The idea is to select rows with unique study_id
where there is more than one unique scale
values.
library(dplyr)
DATA %>%
group_by(study_id) %>%
dplyr::filter(n_distinct(scale) > 1) %>%
ungroup %>%
distinct(study_id)
# study_id
# <chr>
#1 Li
Base R -
unique(subset(DATA, ave(scale, study_id,
FUN = function(x) length(unique(x))) > 1, select = study_id))
# study_id
#8 Li