Home > Mobile >  When replacing medians in an unknown category get NA in R
When replacing medians in an unknown category get NA in R

Time:04-11

I have two datasets

d1=structure(list(et = c("s", "s"), gg = c("d", "d"), hj = c("f", 
"f"), ggh = c("h", "h"), wer = c(23L, 45L)), class = "data.frame", row.names = c(NA, 
-2L))

and

d2=structure(list(et = c("s", "s"), gg = c("d", "d"), hj = c("f", 
"f"), ggh = c("h", "f"), wer = c(3L, 7L)), class = "data.frame", row.names = c(NA, 
-2L))

I perform changing value according to principle that if in the dataset d2 the value wer for the same categories with d1, less or greater than the median from d1 for this category on 1, then in d2 put the value of the median in this category.

To be more clear what i want , for this categoty from d1

et  gg  hj  ggh (this categorical vars)
s   d   f   h

median by wer=34

d2 has same category s d f h where wer=3, so 3<34 so i must change this value on 34, but also d2 has category s d f f which is absent in d1, so we left value for unknown category in d1.

Now i use code

library(dplyr)

d1 %>% 
  group_by(across(-wer)) %>% 
  summarise(wer = median(wer), .groups = "drop") %>% 
  right_join(d2, by = c("et", "gg", "hj", "ggh"), suffix = c("", ".y")) %>% 
  mutate(wer = ifelse(wer >= wer.y, wer, wer.y), .keep = "unused")

it does what I need, however, for an unknown category in d1, it puts down NA result

  et    gg    hj    ggh     wer
  <chr> <chr> <chr> <chr> <dbl>
1 s     d     f     h        34
2 s     d     f     f        NA

But instead must be real value for this category from d2 like this

  et    gg    hj    ggh     wer
  <chr> <chr> <chr> <chr> <dbl>
1 s     d     f     h        34
2 s     d     f     f        7

How can i fix it ? Thank you for your help

CodePudding user response:

Comparison operators returns NA when there is an NA

> 7 > NA
[1] NA

In the code, we just need to do the correction for NA by adding a condition using is.na

library(dplyr)
d1 %>% 
   group_by(across(-wer)) %>% 
   summarise(wer = median(wer), .groups = "drop") %>% 
   right_join(d2, by = c("et", "gg", "hj", "ggh"), suffix = c("", ".y")) %>% 
   mutate(wer = ifelse(wer >= wer.y & !is.na(wer), wer, wer.y), .keep = "unused")
# A tibble: 2 × 5
  et    gg    hj    ggh     wer
  <chr> <chr> <chr> <chr> <dbl>
1 s     d     f     h        34
2 s     d     f     f         7
  • Related