I try to create a subset, where I remove all answers == 0 for variable B, given another variable A == 1. However, I want to keep the NAs in Variable B (just remove the 0s).
I tried it with this df2 <- subset(df, B[df$A == 1] > 0)
but the result makes no sense. Can someone help?
i <- c(1:10)
A <- c(0,1,1,1,0,0,1,1,0,1)
B <- c(0, 10, 13, NA, NA, 9, 0, 0, 3, NA)
df <- data.frame(i, A, B)
CodePudding user response:
subset
takes a condition and returns only the rows where the value is TRUE
. If you try NA == 0
, or NA != 0
it will always return NA
, which is neither TRUE
nor FALSE
, however as subset
would have it it only returns rows where the value is TRUE
. There are multiple ways around this:
subset(df, !(A == 1 & B == 0) | is.na(B))
or:
subset(df, !(A == 1 & B %in% 0))
There's plenty more options available however
CodePudding user response:
This should work, if I understand it correctly:
subset(df, (df$A == 1) & ((df$B != 0) | (is.na(df$B))))
outputs:
i A B
2 1 10
3 1 13
4 1 NA
10 1 NA
CodePudding user response:
If you do not want to specify every single column, you can just change the 0
to NA
and the NA
(temporarily) to a number (for example 999/-999) and switch back after you are finished.
i <- c(1:10)
A <- c(0,1,1,1,0,0,1,1,0,1)
B <- c(0, 10, 13, NA, NA, 9, 0, 0, 3, NA)
df <- data.frame(i, A, B)
df[is.na(df)] <- 999
df[df==0] <- NA
df <- na.omit(df)
df[df==999] <- NA
i A B
2 2 1 10
3 3 1 13
4 4 1 NA
10 10 1 NA
CodePudding user response:
If i
is unique, identify wich cases you want to remove and select the rest, try:
df[df$i != subset(df, A==1 & B==0)$i, ]
Output:
i A B
1 1 0 0
2 2 1 10
3 3 1 13
4 4 1 NA
5 5 0 NA
6 6 0 9
9 9 0 3
10 10 1 NA