Home > Software engineering >  How to get dplyr::ifelse function to ignore NA values in dataframe
How to get dplyr::ifelse function to ignore NA values in dataframe

Time:10-28

I'm having an issue with NA values not being handled the way I need/expect by the dplyr::ifelse statement, and I wondered if anyone could help please?

In this example I am trying to populate a new column (named pair) based on matching the results from another column named (min_col) with each of 3 other columns (A, B and C). So for example if the min_col is the same as column A then the pair column value should correspond to the value in column A1. This mostly works except....

The problem: some of the rows return NA when they should return a value, but some return a value (correctly) even if there is an NA present in that row. It seems to depend on the position of NA in the row.

Example

library(dplyr)

# create a dataframe with some `NA` values
df <- data.frame(A = c(NA, 5:11, NA, NA), A1 = letters[1:10], 
                 B = c(1, NA, 20, NA, 22:26, NA), B1 = letters[11:20], 
                 C = c(6:7, NA, 0, NA, 2:5, NA), C1 = letters[16:25])


#identify the column (either `A`, `B`, or `C`) which has the minimum value
df$min_col <- pmin(df$A, df$B, df$C, na.rm=T)


# find the letter pair (from column `A1`, `B1` or `C1`) which goes with the minimum value
# so for example if the minimum value is the same as column `A` then the `pair` column should be the same as column `A1`.
 
df$pair <- ifelse(df$min_col == df$A, df$A1,
                  ifelse(df$min_col == df$B, df$B1,
                         ifelse(df$min_col == df$C, df$C1, NA)))

This gives the following output

enter image description here

Rows 1, 4 and 9 are returning an NA in the column named pair when I want them to return a letter. In contrast row 5 does give the correct output, even though this row also contains an NA. I can't figure out the issue is.

I would prefer not to replace the NAs with 0 because my real-life data frame actually has columns of lots of different classes including dates , so it gets a bit complicated.

Would be very grateful for any advise!

CodePudding user response:

ifelse is not a dplyr function, if_else is. However, case_when is a good option rather than multiple, nested ifelse functions.

Does this give you the expected output?

library(dplyr)

df |> 
  mutate(min_col = pmin(A, B, C, na.rm = T),
         pair = case_when(min_col == A ~ A1,
                          min_col == B ~ B1,
                          min_col == C ~ C1,
                          TRUE ~ NA_character_))
#>     A A1  B B1  C C1 min_col pair
#> 1  NA  a  1  k  6  p       1    k
#> 2   5  b NA  l  7  q       5    b
#> 3   6  c 20  m NA  r       6    c
#> 4   7  d NA  n  0  s       0    s
#> 5   8  e 22  o NA  t       8    e
#> 6   9  f 23  p  2  u       2    u
#> 7  10  g 24  q  3  v       3    v
#> 8  11  h 25  r  4  w       4    w
#> 9  NA  i 26  s  5  x       5    x
#> 10 NA  j NA  t NA  y      NA <NA>

Created on 2022-10-28 with reprex v2.0.2

  • Related