I have a data frame that has 238 rows and 10 columns. I want to create a new column at the end that contains a dummy variable that says "yes" if the number "1" exists in any or the 10 columns and "no" if none of the columns have "1" in them. I tried
df$dummy = (ifelse(any(x == 1) %in% df[], 'yes', 'no'))
view(df)
but it didn't work. Any input would be greatly appreciated!
CodePudding user response:
If you want to use any
you have to apply
it to the rows (MARGIN=1
).
The solution given by @dash2 is of course a lot shorter and most likely also faster (see rowSums(df==1)
int he comments to the question).
# Create a dummy data set
df <- data.frame(c1 = sample.int(10, size=10, replace = TRUE))
for (i in 2:10)
df[[paste0("c", i)]] <- sample.int(10, size=10, replace = TRUE)
df$new <- apply(df, MARGIN=1, function(x) any(x == 1))
df
#> c1 c2 c3 c4 c5 c6 c7 c8 c9 c10 new
#> 1 5 6 1 2 5 8 10 6 4 7 TRUE
#> 2 9 3 7 3 2 4 2 8 3 4 FALSE
#> 3 6 2 2 9 8 1 6 6 10 8 TRUE
#> 4 10 5 3 6 6 7 3 6 2 8 FALSE
#> 5 4 8 2 10 10 5 5 2 10 10 FALSE
#> 6 4 2 8 8 2 9 7 7 2 2 FALSE
#> 7 9 4 3 3 7 5 10 6 3 8 FALSE
#> 8 7 7 6 9 3 2 2 7 3 2 FALSE
#> 9 7 7 9 9 1 1 3 2 5 5 TRUE
#> 10 3 2 10 2 3 5 2 1 4 3 TRUE
Created on 2022-06-26 by the reprex package (v2.0.1)
CodePudding user response:
> #Creating Random data frame with 2 variables ranges from 0 to 9
>
> set.seed(200)
>
> df <- data.frame(val1 = sample(0:9,100,replace = TRUE),
val2 = sample(0:9,100,replace = TRUE))
> df %>% filter(val1 ==1 | val2 ==1)
val1 val2
1 1 7
2 1 8
3 1 6
4 5 1
5 7 1
6 1 4
7 2 1
8 7 1
9 1 8
10 6 1
11 9 1
12 2 1
13 7 1
14 1 0
15 0 1
16 3 1
17 1 5
18 9 1
19 7 1
20 0 1
21 1 1
> # we notice 21 occuerence of "1" in both of vars.
>
> #mutate a new "dummy" column in a new dataframe
> df1 <- df %>%
mutate(dummy = ifelse(rowSums(df==1) > 0, "yes", "no"))
>
>
> df1 %>%filter(val1 == 1 | val2 == 1)
val1 val2 dummy
1 1 7 yes
2 1 8 yes
3 1 6 yes
4 5 1 yes
5 7 1 yes
6 1 4 yes
7 2 1 yes
8 7 1 yes
9 1 8 yes
10 6 1 yes
11 9 1 yes
12 2 1 yes
13 7 1 yes
14 1 0 yes
15 0 1 yes
16 3 1 yes
17 1 5 yes
18 9 1 yes
19 7 1 yes
20 0 1 yes
21 1 1 yes
> #we see the same 21 occuerences labeled with yes or no
>
> #a random sample of the dataframe
> sample_n(df1,10)
val1 val2 dummy
1 2 7 no
2 7 6 no
3 5 8 no
4 7 9 no
5 5 4 no
6 9 1 yes
7 5 6 no
8 6 9 no
9 6 6 no
10 3 7 no