I want the following statement:
If df$dummy=0 --> delete all rows with na values in column 2:5.
I try
df[df$dummy==0] <- na.omit(df[2:5],)
But it does not function properly.
anyone that can help me?
CodePudding user response:
It's always better to include a little reproducible example, otherwise the folks here who answer your question will need to do it for you.
Suppose your data frame looks like this:
df <- data.frame(dummy = c(0, 1, 1, 0, 0, 0, 1, 1, 0, 0),
col2 = c(1, NA, 3, 4, 5, NA, 7, 8, 9, 10),
col3 = c(1, 2, 3, 4, 5, 6, 7, 8, 9, NA),
col4 = c(NA, 2, 3, 4, 5, 6, 7, 8, 9, 10),
col5 = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10))
df
#> dummy col2 col3 col4 col5
#> 1 0 1 1 NA 1
#> 2 1 NA 2 2 2
#> 3 1 3 3 3 3
#> 4 0 4 4 4 4
#> 5 0 5 5 5 5
#> 6 0 NA 6 6 6
#> 7 1 7 7 7 7
#> 8 1 8 8 8 8
#> 9 0 9 9 9 9
#> 10 0 10 NA 10 10
Then you can filter out the columns where dummy == 0
AND where any row in columns 2:5 have NA
by doing:
df[-which(df$dummy == 0 & apply(df[2:5], 1, anyNA)), ]
#> dummy col2 col3 col4 col5
#> 2 1 NA 2 2 2
#> 3 1 3 3 3 3
#> 4 0 4 4 4 4
#> 5 0 5 5 5 5
#> 7 1 7 7 7 7
#> 8 1 8 8 8 8
#> 9 0 9 9 9 9
You will see that the only NA
that remains occurs in a row where dummy == 1
, as expected.
Created on 2021-11-12 by the reprex package (v2.0.0)
CodePudding user response:
Does this work?
Data:
df <- data.frame(
dummy = c(1,0,0,1,0),
c1 = c(NA, NA, 2, 3, 1),
c2 = c(NA, NA, NA, 1, 4)
)
Solution:
library(dplyr)
df %>%
filter(
!(dummy == 0 & if_any(starts_with("c"), is.na)))
dummy c1 c2
1 1 NA NA
2 1 3 1
3 0 1 4