Home > Net >  How to change specific values to NA in multiple columns within a specific subset of the dataset
How to change specific values to NA in multiple columns within a specific subset of the dataset

Time:11-07

I would like to change specific values to missing values using multiple conditions. Another way to describe what I am doing, I would like to change a specific value in multiple columns, but only for a specific group in my dataset. Suppose I have the following dataset:

df <- data.frame(id = c("A", "A", "A", "A", "B", "B","B", "B"),
                  x1 = c(1, 99, 2, 99, 3, 99, 5, 6),
                  x2 = c(99, 1, 99, 2, 3, 4, 99, 6))

df

  id x1 x2
1  A  1 99
2  A 99  1
3  A  2 99
4  A 99  2
5  B  3  3
6  B 99  4
7  B  5 99
8  B  6  6

I would like to change the values 99 to NA, but only for a subset when id equals A of my dataset. This is a simple example, my real dataset has multiple columns. But I am trying to do something like this:

col <- c("x1", "x2")

df[, col] <- ifelse(df$id == "A" & df[,col] == 99, NA, df[,col])

I tried other variations of the code, but I keep getting error messages, not sure what I am doing wrong. Does anyone has a suggestion, or does anyone knows what am I getting wrong?

CodePudding user response:

ifelse often does not behave quite as expected; it's important to remember that, from the documentation ?ifelse, it:

"returns a value with the same shape as test"

I think replace can work here:

df[, col] <- replace(df[, col], 
                     df[, col] == 99 & df$id == "A",
                     NA)

Result:

  id x1 x2
1  A  1 NA
2  A NA  1
3  A  2 NA
4  A NA  2
5  B  3  3
6  B 99  4
7  B  5 99   
8  B  6  6
  • Related