Home > other >  R: Create a dummy variable if a particular value exists in any of the previous columns
R: Create a dummy variable if a particular value exists in any of the previous columns

Time:06-27

I have a data frame that has 238 rows and 10 columns. I want to create a new column at the end that contains a dummy variable that says "yes" if the number "1" exists in any or the 10 columns and "no" if none of the columns have "1" in them. I tried

df$dummy = (ifelse(any(x == 1) %in% df[], 'yes', 'no'))
view(df)

but it didn't work. Any input would be greatly appreciated!

CodePudding user response:

If you want to use any you have to apply it to the rows (MARGIN=1).

The solution given by @dash2 is of course a lot shorter and most likely also faster (see rowSums(df==1) int he comments to the question).

# Create a dummy data set
df <- data.frame(c1 = sample.int(10, size=10, replace = TRUE))
for (i in 2:10)
  df[[paste0("c", i)]] <- sample.int(10, size=10, replace = TRUE)

df$new <- apply(df, MARGIN=1, function(x) any(x == 1))
df
#>    c1 c2 c3 c4 c5 c6 c7 c8 c9 c10   new
#> 1   5  6  1  2  5  8 10  6  4   7  TRUE
#> 2   9  3  7  3  2  4  2  8  3   4 FALSE
#> 3   6  2  2  9  8  1  6  6 10   8  TRUE
#> 4  10  5  3  6  6  7  3  6  2   8 FALSE
#> 5   4  8  2 10 10  5  5  2 10  10 FALSE
#> 6   4  2  8  8  2  9  7  7  2   2 FALSE
#> 7   9  4  3  3  7  5 10  6  3   8 FALSE
#> 8   7  7  6  9  3  2  2  7  3   2 FALSE
#> 9   7  7  9  9  1  1  3  2  5   5  TRUE
#> 10  3  2 10  2  3  5  2  1  4   3  TRUE

Created on 2022-06-26 by the reprex package (v2.0.1)

CodePudding user response:

> #Creating Random data frame with 2 variables ranges from 0 to 9
> 
> set.seed(200)
> 
> df <- data.frame(val1 = sample(0:9,100,replace = TRUE),
                   val2 = sample(0:9,100,replace = TRUE))
> df %>% filter(val1 ==1 | val2 ==1)
   val1 val2
1     1    7
2     1    8
3     1    6
4     5    1
5     7    1
6     1    4
7     2    1
8     7    1
9     1    8
10    6    1
11    9    1
12    2    1
13    7    1
14    1    0
15    0    1
16    3    1
17    1    5
18    9    1
19    7    1
20    0    1
21    1    1
> # we notice 21 occuerence of "1" in both of vars.
> 
> #mutate a new "dummy" column in a new dataframe
> df1 <-  df %>%
    mutate(dummy = ifelse(rowSums(df==1) > 0, "yes", "no"))
> 
> 
> df1 %>%filter(val1 == 1 | val2 == 1)
   val1 val2 dummy
1     1    7   yes
2     1    8   yes
3     1    6   yes
4     5    1   yes
5     7    1   yes
6     1    4   yes
7     2    1   yes
8     7    1   yes
9     1    8   yes
10    6    1   yes
11    9    1   yes
12    2    1   yes
13    7    1   yes
14    1    0   yes
15    0    1   yes
16    3    1   yes
17    1    5   yes
18    9    1   yes
19    7    1   yes
20    0    1   yes
21    1    1   yes
> #we see the same 21 occuerences labeled with yes or no
> 
> #a random sample of the dataframe
> sample_n(df1,10)
   val1 val2 dummy
1     2    7    no
2     7    6    no
3     5    8    no
4     7    9    no
5     5    4    no
6     9    1   yes
7     5    6    no
8     6    9    no
9     6    6    no
10    3    7    no
  •  Tags:  
  • r
  • Related