Home > Software design >  Nested subsetting in a dataframe in R
Nested subsetting in a dataframe in R

Time:12-10

I wonder how to subset my data below, such that I end up with 4 studyies consisting of:

(A) 2 unique studyies for which study_type==standard including 1 study with reporting==subscale, 1 study with reporting==composite (like study 1 and 3)

AND

(B) 2 unique studyies for which study_type==alternative including 1 study with reporting==subscale, 1 study with reporting==composite.(like study 5 and 7)

Is this possible in R?

m="
study subscale  reporting  obs include yi   vi         study_type
1        A      subscale   1   yes     1.94 0.33503768 standard
1        A      subscale   2   yes     1.06 0.01076604 standard
2        A      subscale   3   yes     2.41 0.23767389 standard
2        A      subscale   4   yes     2.34 0.37539841 standard
3        A&C    composite  5   yes     3.09 0.31349510 standard
3        A&C    composite  6   yes     3.99 0.01349510 standard
4        A&B    composite  7   yes     2.90 0.91349510 standard
4        A&B    composite  8   yes     3.01 0.99349510 standard
5        G&H    composite  9   yes     1.01 0.99910197 alternative
5        G&H    composite  10  yes     2.10 0.97910095 alternative
6        E&G    composite  11  yes     0.11 0.27912095 alternative
6        E&G    composite  12  yes     3.12 0.87910095 alternative
7        E      subscale   13  yes     0.08 0.21670360 alternative
7        G      subscale   14  yes     1.00 0.91597190 alternative
8        F      subscale   15  yes     1.08 0.81670360 alternative
8        E      subscale   16  yes     0.99 0.91297170 alternative"
data <- read.table(text=m,h=T)

CodePudding user response:

If I understand you correctly, you could use dplyr::distinct


library(tidyverse)

data %>%
  distinct(study_type, reporting, .keep_all = TRUE)
#>   study subscale reporting obs include   yi        vi  study_type
#> 1     1        A  subscale   1     yes 1.94 0.3350377    standard
#> 2     3      A&C composite   5     yes 3.09 0.3134951    standard
#> 3     5      G&H composite   9     yes 1.01 0.9991020 alternative
#> 4     7        E  subscale  13     yes 0.08 0.2167036 alternative

CodePudding user response:

If you are asking how to filter your data into the subsets you were asking, you could do this:

> study1 <- dplyr::filter(data, study_type == "standard" & reporting == "subscale")
> study1
  study subscale reporting obs include   yi         vi study_type
1     1        A  subscale   1     yes 1.94 0.33503768   standard
2     1        A  subscale   2     yes 1.06 0.01076604   standard
3     2        A  subscale   3     yes 2.41 0.23767389   standard
4     2        A  subscale   4     yes 2.34 0.37539841   standard
> study2 <- dplyr::filter(data, study_type == "standard" & reporting == "composite")
> study2
  study subscale reporting obs include   yi        vi study_type
1     3      A&C composite   5     yes 3.09 0.3134951   standard
2     3      A&C composite   6     yes 3.99 0.0134951   standard
3     4      A&B composite   7     yes 2.90 0.9134951   standard
4     4      A&B composite   8     yes 3.01 0.9934951   standard
> study3 <- dplyr::filter(data, study_type == "alternative" & reporting == "subscale")
> study3
  study subscale reporting obs include   yi        vi  study_type
1     7        E  subscale  13     yes 0.08 0.2167036 alternative
2     7        G  subscale  14     yes 1.00 0.9159719 alternative
3     8        F  subscale  15     yes 1.08 0.8167036 alternative
4     8        E  subscale  16     yes 0.99 0.9129717 alternative
> study4 <- dplyr::filter(data, study_type == "alternative" & reporting == "composite")
> study4
  study subscale reporting obs include   yi       vi  study_type
1     5      G&H composite   9     yes 1.01 0.999102 alternative
2     5      G&H composite  10     yes 2.10 0.979101 alternative
3     6      E&G composite  11     yes 0.11 0.279121 alternative
4     6      E&G composite  12     yes 3.12 0.879101 alternative
  • Related