Removing 0s from dataframe without removing NAs-CodePudding

I try to create a subset, where I remove all answers == 0 for variable B, given another variable A == 1. However, I want to keep the NAs in Variable B (just remove the 0s).

I tried it with this df2 <- subset(df, B[df$A == 1] > 0) but the result makes no sense. Can someone help?

i <- c(1:10)
A <- c(0,1,1,1,0,0,1,1,0,1)
B <- c(0, 10, 13, NA, NA, 9, 0, 0, 3, NA)
df <- data.frame(i, A, B)

CodePudding user response：

subset takes a condition and returns only the rows where the value is TRUE. If you try NA == 0, or NA != 0 it will always return NA, which is neither TRUE nor FALSE, however as subset would have it it only returns rows where the value is TRUE. There are multiple ways around this:

subset(df, !(A == 1 & B == 0) | is.na(B))

or:

subset(df, !(A == 1 & B %in% 0))

There's plenty more options available however

CodePudding user response：

This should work, if I understand it correctly:

subset(df, (df$A == 1) & ((df$B != 0) | (is.na(df$B))))

outputs:

CodePudding user response：

If you do not want to specify every single column, you can just change the 0 to NA and the NA (temporarily) to a number (for example 999/-999) and switch back after you are finished.

i <- c(1:10)
A <- c(0,1,1,1,0,0,1,1,0,1)
B <- c(0, 10, 13, NA, NA, 9, 0, 0, 3, NA)
df <- data.frame(i, A, B)

df[is.na(df)] <- 999
df[df==0] <- NA
df <- na.omit(df)
df[df==999] <- NA

    i A  B
2   2 1 10
3   3 1 13
4   4 1 NA
10 10 1 NA

CodePudding user response：

If i is unique, identify wich cases you want to remove and select the rest, try:

df[df$i != subset(df, A==1 & B==0)$i, ]

Output: