Home > Software engineering >  Finding the grouping variable for which the unique values of a variable is more than one
Finding the grouping variable for which the unique values of a variable is more than one

Time:07-02

In DATA below, I was wondering how to find the unique study_id for which variable scale takes on more than one unique value?

The expected answer should be Li (scale for Li has other & MBTI). But I wonder how to find it via BASE or dplyr code?

m="
study_id   year es_id       r     se     n pub_type  context  ed_setting  age_grp L1    L2    prof  scale outcome
Dreyer     1992   130  0      0.0574   305 DocDisse~ Foreign~ CollegeUni~ Adult   Afri~ Engl~ NA    Other Listen~
Dreyer     1992   131  0.04   0.0574   305 DocDisse~ Foreign~ CollegeUni~ Adult   Afri~ Engl~ NA    Other Writing
Dreyer     1992   132 -0.03   0.0574   305 DocDisse~ Foreign~ CollegeUni~ Adult   Afri~ Engl~ NA    Other Reading
Dreyer     1992   133  0      0.0574   305 DocDisse~ Foreign~ CollegeUni~ Adult   Afri~ Engl~ NA    Other Overall
Ghapanchi  2011    89  0.31   0.0806   141 JournalA~ Foreign~ CollegeUni~ Adult   Pers~ Engl~ NA    Other Overall
Hassan     2001   177  0.25   0.117     71 NA        Foreign~ CollegeUni~ NA      Arab~ Engl~ NA    Other Speaki~
Kralova    2012   137  0.0252 0.117     75 JournalA~ Foreign~ CollegeUni~ Adult   Slov~ Engl~ Inte~ Other Speaki~
Li         2009    55 -0.04   0.132     59 JournalA~ Foreign~ CollegeUni~ Adult   Chin~ Engl~ NA    Other Grammar
Li         2009    56  0.355  0.124     59 JournalA~ Foreign~ CollegeUni~ Adult   Chin~ Engl~ NA    Other Pragma~
Li         2003    57  0.039  0.0735   187 JournalA~ Foreign~ CollegeUni~ Multip~ Chin~ Engl~ NA    MBTI  Overall
"

DATA <- read.table(text = m, h=T)

CodePudding user response:

Here's a way in dplyr as well as base R -

The idea is to select rows with unique study_id where there is more than one unique scale values.

library(dplyr)

DATA %>%
  group_by(study_id) %>%
  dplyr::filter(n_distinct(scale) > 1) %>%
  ungroup %>%
  distinct(study_id)

# study_id
#  <chr>   
#1 Li      

Base R -

unique(subset(DATA, ave(scale, study_id, 
       FUN = function(x) length(unique(x))) > 1, select = study_id))

#  study_id
#8       Li
  • Related