I'm having an issue with NA values not being handled the way I need/expect by the dplyr::ifelse
statement, and I wondered if anyone could help please?
In this example I am trying to populate a new column (named pair
) based on matching the results from another column named (min_col
) with each of 3 other columns (A
, B
and C
). So for example if the min_col
is the same as column A
then the pair
column value should correspond to the value in column A1
. This mostly works except....
The problem: some of the rows return NA
when they should return a value, but some return a value (correctly) even if there is an NA
present in that row. It seems to depend on the position of NA
in the row.
Example
library(dplyr)
# create a dataframe with some `NA` values
df <- data.frame(A = c(NA, 5:11, NA, NA), A1 = letters[1:10],
B = c(1, NA, 20, NA, 22:26, NA), B1 = letters[11:20],
C = c(6:7, NA, 0, NA, 2:5, NA), C1 = letters[16:25])
#identify the column (either `A`, `B`, or `C`) which has the minimum value
df$min_col <- pmin(df$A, df$B, df$C, na.rm=T)
# find the letter pair (from column `A1`, `B1` or `C1`) which goes with the minimum value
# so for example if the minimum value is the same as column `A` then the `pair` column should be the same as column `A1`.
df$pair <- ifelse(df$min_col == df$A, df$A1,
ifelse(df$min_col == df$B, df$B1,
ifelse(df$min_col == df$C, df$C1, NA)))
This gives the following output
Rows 1, 4 and 9 are returning an NA
in the column named pair
when I want them to return a letter. In contrast row 5 does give the correct output, even though this row also contains an NA
. I can't figure out the issue is.
I would prefer not to replace the NA
s with 0 because my real-life data frame actually has columns of lots of different classes including dates , so it gets a bit complicated.
Would be very grateful for any advise!
CodePudding user response:
ifelse
is not a dplyr function, if_else
is. However, case_when
is a good option rather than multiple, nested ifelse
functions.
Does this give you the expected output?
library(dplyr)
df |>
mutate(min_col = pmin(A, B, C, na.rm = T),
pair = case_when(min_col == A ~ A1,
min_col == B ~ B1,
min_col == C ~ C1,
TRUE ~ NA_character_))
#> A A1 B B1 C C1 min_col pair
#> 1 NA a 1 k 6 p 1 k
#> 2 5 b NA l 7 q 5 b
#> 3 6 c 20 m NA r 6 c
#> 4 7 d NA n 0 s 0 s
#> 5 8 e 22 o NA t 8 e
#> 6 9 f 23 p 2 u 2 u
#> 7 10 g 24 q 3 v 3 v
#> 8 11 h 25 r 4 w 4 w
#> 9 NA i 26 s 5 x 5 x
#> 10 NA j NA t NA y NA <NA>
Created on 2022-10-28 with reprex v2.0.2