I have two datasets
d1=structure(list(et = c("s", "s"), gg = c("d", "d"), hj = c("f",
"f"), ggh = c("h", "h"), wer = c(23L, 45L)), class = "data.frame", row.names = c(NA,
-2L))
and
d2=structure(list(et = c("s", "s"), gg = c("d", "d"), hj = c("f",
"f"), ggh = c("h", "f"), wer = c(3L, 7L)), class = "data.frame", row.names = c(NA,
-2L))
I perform changing value according to principle that if in the dataset d2
the value wer
for the same categories with d1
, less or greater than the median from d1
for this category on 1, then in d2 put the value of the median in this category.
To be more clear what i want , for this categoty from d1
et gg hj ggh (this categorical vars)
s d f h
median by wer=34
d2 has same category s d f h
where wer=3, so 3<34 so i must change this value on 34,
but also d2 has category s d f f
which is absent in d1, so we left value for unknown category in d1.
Now i use code
library(dplyr)
d1 %>%
group_by(across(-wer)) %>%
summarise(wer = median(wer), .groups = "drop") %>%
right_join(d2, by = c("et", "gg", "hj", "ggh"), suffix = c("", ".y")) %>%
mutate(wer = ifelse(wer >= wer.y, wer, wer.y), .keep = "unused")
it does what I need, however, for an unknown category in d1, it puts down NA result
et gg hj ggh wer
<chr> <chr> <chr> <chr> <dbl>
1 s d f h 34
2 s d f f NA
But instead must be real value for this category from d2 like this
et gg hj ggh wer
<chr> <chr> <chr> <chr> <dbl>
1 s d f h 34
2 s d f f 7
How can i fix it ? Thank you for your help
CodePudding user response:
Comparison operators returns NA
when there is an NA
> 7 > NA
[1] NA
In the code, we just need to do the correction for NA
by adding a condition using is.na
library(dplyr)
d1 %>%
group_by(across(-wer)) %>%
summarise(wer = median(wer), .groups = "drop") %>%
right_join(d2, by = c("et", "gg", "hj", "ggh"), suffix = c("", ".y")) %>%
mutate(wer = ifelse(wer >= wer.y & !is.na(wer), wer, wer.y), .keep = "unused")
# A tibble: 2 × 5
et gg hj ggh wer
<chr> <chr> <chr> <chr> <dbl>
1 s d f h 34
2 s d f f 7