Home > Mobile >  Replace all values in dataframe with an ifelse and %in% operator
Replace all values in dataframe with an ifelse and %in% operator

Time:12-04

I'd like to recode all the values in a data set with an ifelse statement, using %in% as the operator. Here is some example data.

set.seed(123)
df <- data.frame( matrix(sample(1:5,20,TRUE), 4, 5) )
df 

> df 
  X1 X2 X3 X4 X5
1  3  3  2  3  1
2  3  5  3  1  5
3  2  4  5  4  3
4  2  1  3  1  2

I know I can convert values of 4 or 5 to value 1, and rest to zero with this code

df_result <- ifelse( df == 4 | df == 5 , 1, 0) 

> df_result
     X1 X2 X3 X4 X5
[1,]  0  0  0  0  0
[2,]  0  1  0  0  1
[3,]  0  1  1  1  0
[4,]  0  0  0  0  0

However I would prefer to use %in% c(4,5) like this, but it doesn't work

ifelse( df %in% c(4,5) , 1, 0) 
[1] 0 0 0 0 0

It just reports a string of numbers. Is it possible to tweak it to make this work? The reason I want to use %in% operator is that the c(4,5) part can have a large number of items making it impractical to use OR repeatedly. Any method can be used as long as I can represent the c(4,5) in the code that way. Any help is greatly appreciated.

CodePudding user response:

Here are some base R options:

This maps a function over the columns in your data frame. Specifically the function tests if each element of the column is in your vector. This returns boolean values, which are then converted to 0 and 1 with as.numeric. The output of Map is a list, so data.frame converts it back to a data frame.

data.frame(Map(function(x) as.numeric(x %in% c(4,5)), df))

  X1 X2 X3 X4 X5
1  0  0  0  0  1
2  0  0  1  0  0
3  0  1  0  1  0
4  1  1  0  1  0

Similarly,

data.frame(apply(df, 2, function(x) as.numeric(x %in% c(4,5))))

Alternatively, you can test the data frame against each vector element and then combine the results:

data.frame(Reduce(` `, lapply(c(4,5), `==`, df)))

  X1 X2 X3 X4 X5
1  0  0  0  0  1
2  0  0  1  0  0
3  0  1  0  1  0
4  1  1  0  1  0

lapply will test df == 4 and store that boolean matrix in a list element. Then it will test df == 5 and again store that in a list element. Reduce then sums these two list elements element-wise.

CodePudding user response:

Using the tidyverse and magrittr's placeholder dot we can make a function and apply it to each column of the data.frame.

library(dplyr)
df %>%
  mutate(across(.fns = ~ifelse(. %in% c(4,5), 1, 0)))

 X1 X2 X3 X4 X5
1  0  0  0  0  0
2  0  1  0  0  1
3  0  1  1  1  0
4  0  0  0  0  0

However, be careful when you have numeric columns and integer columns as using %in% can sometimes be misleading:

c(1, 1L) %in% c(1, 1.1)
[1] TRUE TRUE
  •  Tags:  
  • r
  • Related