Create subset of the sample by different variables simultaneously-CodePudding

I have a data frame as the following. Variables a and b are continuous, and variables v1-v7 are binary.

> df <- data.frame(a= c(1,1,2,3,5),
                       b  = c(3, 6,8, 2, 4),
                       v1 = c(0,0,0,0,0),
                       v2 = c(1,0,0,0,0),
                       v3 = c(0,1,1,1,1),
                       v4 = c(0,1,1,1,1),
                       v5 = c(0,0,0,0,1),
                       v6 = c(0,0,0,0,0),
                       v7 = c(0,0,0,0,0))
> df
  a b v1 v2 v3 v4 v5 v6 v7
1 1 3  0  1  0  0  0  0  0
2 1 6  0  0  1  1  0  0  0
3 2 8  0  0  1  1  0  0  0
4 3 2  0  0  1  1  0  0  0
5 5 4  0  0  1  1  1  0  0
>

I want to create seven subsamples based on the data frame I showed above. Specifically, I want to make seven subsamples that only include variables a and b and when each v1-v7 equals 1. For example,

> df1 <- df %>% filter(v1==1)
> df1
[1] a  b  v1 v2 v3 v4 v5 v6 v7
<0 rows> (or 0-length row.names)
> df2 <- df %>% filter(v2==1)
> df2
  a b v1 v2 v3 v4 v5 v6 v7
1 1 3  0  1  0  0  0  0  0
> df3 <- df %>% filter(v3==1)
> df3
  a b v1 v2 v3 v4 v5 v6 v7
1 1 6  0  0  1  1  0  0  0
2 2 8  0  0  1  1  0  0  0
3 3 2  0  0  1  1  0  0  0
4 5 4  0  0  1  1  1  0  0

I want to know how can I do these simultaneously in R? Thanks.

CodePudding user response：

Just loop over the columns 'v1' to 'v7' and do the filter and return in a list

library(dplyr)
library(stringr)
library(purrr)
lst1 <- str_subset(names(df), "^v\\d ") %>%
           map(~ df %>% 
               filter(if_all(all_of(.x), ~ .x == 1)))
names(lst1) <- str_c('df', seq_along(lst1))

It is better to keep it in a list. If we need objects created in the global env (not recommended), use list2env on the named list

list2env(lst1, .GlobalEnv)

CodePudding user response：

in dplyr you can specify a variable name as character string with the pronoun .data (see data masking)

df_samples <- list()
for(i in 1:7)
  df_samples[[i]] <- filter(df, .data[[paste0("v", i)]] == 1)

CodePudding user response：

Here's a way with lapply(). You are better off keeping your results in a list. Subsample for v1 would be subsamples[[1]] and so on. -

subsamples <- lapply(3:9, function(x) df[df[[x]]==1, ])
subsamples

[[1]]
[1] a  b  v1 v2 v3 v4 v5 v6 v7
<0 rows> (or 0-length row.names)

[[2]]
  a b v1 v2 v3 v4 v5 v6 v7
1 1 3  0  1  0  0  0  0  0

[[3]]
  a b v1 v2 v3 v4 v5 v6 v7
2 1 6  0  0  1  1  0  0  0
3 2 8  0  0  1  1  0  0  0
4 3 2  0  0  1  1  0  0  0
5 5 4  0  0  1  1  1  0  0

[[4]]
  a b v1 v2 v3 v4 v5 v6 v7
2 1 6  0  0  1  1  0  0  0
3 2 8  0  0  1  1  0  0  0
4 3 2  0  0  1  1  0  0  0
5 5 4  0  0  1  1  1  0  0

[[5]]
  a b v1 v2 v3 v4 v5 v6 v7
5 5 4  0  0  1  1  1  0  0

[[6]]
[1] a  b  v1 v2 v3 v4 v5 v6 v7
<0 rows> (or 0-length row.names)

[[7]]
[1] a  b  v1 v2 v3 v4 v5 v6 v7
<0 rows> (or 0-length row.names)