Home > OS >  Exclude specific value from mutate w/ ifelse
Exclude specific value from mutate w/ ifelse

Time:02-19

I am sure this is a simple question that has been asked before but the way I am phrasing it may be preventing me from getting the answers. I'm very new at programming and don't know the language very well (or how to code at all).

I am gradually editing Dataset A by matching values of one of its columns to columns of other datasets. On a match, I am writing "E" in column "Existing" of Dataset A. I am using the dplyr package with mutate and ifelse, knowledge of which I got from more experienced users in this community.

This is an example of the code I am using:

DatasetA <- DatasetA %>% mutate(Existing = ifelse(ID %in% DatasetB$ID, 'E',NA))

Since I am doing this more than once, my first thought was to just copy-paste the same code and change the tables and columns I need. Of course, eventually I realised this overwrites the previous "E"s and "NA"s.

I need to change the code so that if there is an "E" in the column already, it is not touched by mutate. It just adds new "E"s.

CodePudding user response:

This will preserve the E of ID 3, even if its not part of DatasetB$ID, but because it was E in the first place:

library(tidyverse)

DatasetB <- tibble(ID = seq(2))
DatasetB
#> # A tibble: 2 x 1
#>      ID
#>   <int>
#> 1     1
#> 2     2
DatasetA <- tibble(ID = c(2, 3, 4), Existing = c(NA, "E", "A"))
DatasetA
#> # A tibble: 3 x 2
#>      ID Existing
#>   <dbl> <chr>   
#> 1     2 <NA>    
#> 2     3 E       
#> 3     4 A

DatasetA %>%
  mutate(
    Existing = case_when(
      # preserve just E
      Existing == "E" ~ "E",
      ID %in% DatasetB$ID ~ "E"
    )
  )
#> # A tibble: 3 x 2
#>      ID Existing
#>   <dbl> <chr>   
#> 1     2 E       
#> 2     3 E       
#> 3     4 <NA>


DatasetA %>%
  mutate(
    Existing = case_when(
      # preserve any non NA character
      !is.na(Existing) ~ Existing,
      ID %in% DatasetB$ID ~ "E"
    )
  )
#> # A tibble: 3 x 2
#>      ID Existing
#>   <dbl> <chr>   
#> 1     2 E       
#> 2     3 E       
#> 3     4 A

case_when is a more flexible way than using ifelse which can handle multiple cases / conditions.

CodePudding user response:

You can nest ifelse statements to achieve what you're willing to do. In your case let a dataset Dataset with A and B columns :

Dataset <- Dataset %>% mutate(Existing = ifelse(ID %in% Dataset$A, 'E',ifelse(ID %in% Dataset$B,'E','NA'))

So this command will check if ID is in column A, if yes you will obtain "E". If not, it will check if it is in column B, if yes you will obtain "E" and if not you will obtain "NA". Nesting ifelse statements is an easy and intuitive ways to check several if else conditions at once.

A limitation of ifelse statement is that they can only give you a scalar. If you want a vector as a result you have to use traditional if...else statements.

  • Related