I have a data frame as the following. Variables a and b are continuous, and variables v1-v7 are binary.
> df <- data.frame(a= c(1,1,2,3,5),
b = c(3, 6,8, 2, 4),
v1 = c(0,0,0,0,0),
v2 = c(1,0,0,0,0),
v3 = c(0,1,1,1,1),
v4 = c(0,1,1,1,1),
v5 = c(0,0,0,0,1),
v6 = c(0,0,0,0,0),
v7 = c(0,0,0,0,0))
> df
a b v1 v2 v3 v4 v5 v6 v7
1 1 3 0 1 0 0 0 0 0
2 1 6 0 0 1 1 0 0 0
3 2 8 0 0 1 1 0 0 0
4 3 2 0 0 1 1 0 0 0
5 5 4 0 0 1 1 1 0 0
>
I want to create seven subsamples based on the data frame I showed above. Specifically, I want to make seven subsamples that only include variables a and b and when each v1-v7 equals 1. For example,
> df1 <- df %>% filter(v1==1)
> df1
[1] a b v1 v2 v3 v4 v5 v6 v7
<0 rows> (or 0-length row.names)
> df2 <- df %>% filter(v2==1)
> df2
a b v1 v2 v3 v4 v5 v6 v7
1 1 3 0 1 0 0 0 0 0
> df3 <- df %>% filter(v3==1)
> df3
a b v1 v2 v3 v4 v5 v6 v7
1 1 6 0 0 1 1 0 0 0
2 2 8 0 0 1 1 0 0 0
3 3 2 0 0 1 1 0 0 0
4 5 4 0 0 1 1 1 0 0
I want to know how can I do these simultaneously in R? Thanks.
CodePudding user response:
Just loop over the columns 'v1' to 'v7' and do the filter
and return in a list
library(dplyr)
library(stringr)
library(purrr)
lst1 <- str_subset(names(df), "^v\\d ") %>%
map(~ df %>%
filter(if_all(all_of(.x), ~ .x == 1)))
names(lst1) <- str_c('df', seq_along(lst1))
It is better to keep it in a list
. If we need objects created in the global env (not recommended), use list2env
on the named list
list2env(lst1, .GlobalEnv)
CodePudding user response:
in dplyr you can specify a variable name as character string with the pronoun .data
(see data masking)
df_samples <- list()
for(i in 1:7)
df_samples[[i]] <- filter(df, .data[[paste0("v", i)]] == 1)
CodePudding user response:
Here's a way with lapply()
. You are better off keeping your results in a list. Subsample for v1
would be subsamples[[1]]
and so on. -
subsamples <- lapply(3:9, function(x) df[df[[x]]==1, ])
subsamples
[[1]]
[1] a b v1 v2 v3 v4 v5 v6 v7
<0 rows> (or 0-length row.names)
[[2]]
a b v1 v2 v3 v4 v5 v6 v7
1 1 3 0 1 0 0 0 0 0
[[3]]
a b v1 v2 v3 v4 v5 v6 v7
2 1 6 0 0 1 1 0 0 0
3 2 8 0 0 1 1 0 0 0
4 3 2 0 0 1 1 0 0 0
5 5 4 0 0 1 1 1 0 0
[[4]]
a b v1 v2 v3 v4 v5 v6 v7
2 1 6 0 0 1 1 0 0 0
3 2 8 0 0 1 1 0 0 0
4 3 2 0 0 1 1 0 0 0
5 5 4 0 0 1 1 1 0 0
[[5]]
a b v1 v2 v3 v4 v5 v6 v7
5 5 4 0 0 1 1 1 0 0
[[6]]
[1] a b v1 v2 v3 v4 v5 v6 v7
<0 rows> (or 0-length row.names)
[[7]]
[1] a b v1 v2 v3 v4 v5 v6 v7
<0 rows> (or 0-length row.names)