Starting from a dataframe like this:
col1 <- c("Anne", "Emma", "Katy", "Albert", "Richard")
col2 <- c("Albert", "Mark", "Mike", "Loren", "Anne")
col3 <- c("Mark", "Emma", "Paul", "George", "Samuel" )
df <- cbind(col1, col2, col3)
I would like to keep only the values reported in this vector:
selected <- c("Emma", "Katy", "Mark")
and delete all the others, in order to have a new dataframe like this:
col1 col2 col3
NA NA "Mark"
"Emma" "Mark" "Emma"
"Katy" NA NA
NA NA NA
NA NA NA
I have tried with the following code and it works:
df[df != "Emma" & df != "Katy" & df != "Mark"] <- NA
but I would like to find a way to use the vector selected
in an if statement, instead of writing all the conditions manually.
Indeed, my actual dataframe and vector of values are bigger than the ones in this example.
Thanks in advance for your help!
CodePudding user response:
The code in the question creates a matrix with cbind
, not a data.frame. This is important because df's are lists of vectors all of the same length with a dim attribute set whereas matrices are a folded vector, a vector with a dim attribute set.
- For data.frames, use a loop over its columns, applying function
'%in%'
to each of them; - For matrices, there's no need for a loop.
col1 <- c("Anne", "Emma", "Katy", "Albert", "Richard")
col2 <- c("Albert", "Mark", "Mike", "Loren", "Anne")
col3 <- c("Mark", "Emma", "Paul", "George", "Samuel" )
mat <- cbind(col1, col2, col3)
df <- data.frame(col1, col2, col3)
selected <- c("Emma", "Katy", "Mark")
is.na(df) <- !sapply(df, `%in%`, selected)
df
#> col1 col2 col3
#> 1 <NA> <NA> Mark
#> 2 Emma Mark Emma
#> 3 Katy <NA> <NA>
#> 4 <NA> <NA> <NA>
#> 5 <NA> <NA> <NA>
is.na(mat) <- !mat %in% selected
mat
#> col1 col2 col3
#> [1,] NA NA "Mark"
#> [2,] "Emma" "Mark" "Emma"
#> [3,] "Katy" NA NA
#> [4,] NA NA NA
#> [5,] NA NA NA
Created on 2022-03-20 by the reprex package (v2.0.1)