I would like to change specific values to missing values using multiple conditions. Another way to describe what I am doing, I would like to change a specific value in multiple columns, but only for a specific group in my dataset. Suppose I have the following dataset:
df <- data.frame(id = c("A", "A", "A", "A", "B", "B","B", "B"),
x1 = c(1, 99, 2, 99, 3, 99, 5, 6),
x2 = c(99, 1, 99, 2, 3, 4, 99, 6))
df
id x1 x2
1 A 1 99
2 A 99 1
3 A 2 99
4 A 99 2
5 B 3 3
6 B 99 4
7 B 5 99
8 B 6 6
I would like to change the values 99 to NA, but only for a subset when id equals A of my dataset. This is a simple example, my real dataset has multiple columns. But I am trying to do something like this:
col <- c("x1", "x2")
df[, col] <- ifelse(df$id == "A" & df[,col] == 99, NA, df[,col])
I tried other variations of the code, but I keep getting error messages, not sure what I am doing wrong. Does anyone has a suggestion, or does anyone knows what am I getting wrong?
CodePudding user response:
ifelse
often does not behave quite as expected; it's important to remember that, from the documentation ?ifelse
, it:
"returns a value with the same shape as test"
I think replace
can work here:
df[, col] <- replace(df[, col],
df[, col] == 99 & df$id == "A",
NA)
Result:
id x1 x2
1 A 1 NA
2 A NA 1
3 A 2 NA
4 A NA 2
5 B 3 3
6 B 99 4
7 B 5 99
8 B 6 6