A simple example to reproduce
d1 = structure(list(et = c("s", "s"), gg = c("d", "d"), hj = c("f",
"f"), ggh = c("h", "h"), wer = c(23L, 45L)), class = "data.frame", row.names = c(NA,
-2L))
where et
, gg
, hj
and ggh
are categorical variables and wer
is a metric variable. So, for this category
et gg hj ggh
s d f h
the median (by wer
) is 34
.
There is a second dataset
d2 <- structure(list(et = "s", gg = "d", hj = "f", ggh = "h", wer = 3L), class = "data.frame", row.names = c(NA,
-1L))
for this category
et gg hj ggh
s d f h
wer
equals 3
How to do that if in the dataset d2
the value wer
for the same categories with d1, less or greater than the median from d1 for this category on 1, then in d2 put the value of the median in this category.
So in this simple example desired output in d2
will be
et gg hj ggh wer
s d f h 34
because 3 from the d2 dataset is less than 34 (the median for this category in d1) by 31.
Thank you for your help.
CodePudding user response:
You could calculate the median of d1
and then do a right_join
on d2
:
library(dplyr)
d1 %>%
group_by(across(-wer)) %>%
summarise(wer = median(wer), .groups = "drop") %>%
right_join(d2, by = c("et", "gg", "hj", "ggh"), suffix = c("", ".y")) %>%
mutate(wer = ifelse(wer >= wer.y, wer, wer.y), .keep = "unused")
This returns
# A tibble: 1 x 5
et gg hj ggh wer
<chr> <chr> <chr> <chr> <dbl>
1 s d f h 34