Home > OS >  How to change values in data depending on group median
How to change values in data depending on group median

Time:04-09

A simple example to reproduce

d1 = structure(list(et = c("s", "s"), gg = c("d", "d"), hj = c("f", 
    "f"), ggh = c("h", "h"), wer = c(23L, 45L)), class = "data.frame", row.names = c(NA, 
    -2L))

where et, gg, hj and ggh are categorical variables and wer is a metric variable. So, for this category

    et  gg  hj  ggh
    s   d   f   h

the median (by wer) is 34.

There is a second dataset

d2 <- structure(list(et = "s", gg = "d", hj = "f", ggh = "h", wer = 3L), class = "data.frame", row.names = c(NA, 
    -1L))

for this category

    et  gg  hj  ggh
    s   d   f   h

wer equals 3

How to do that if in the dataset d2 the value wer for the same categories with d1, less or greater than the median from d1 for this category on 1, then in d2 put the value of the median in this category. So in this simple example desired output in d2 will be

    et  gg  hj  ggh wer
    s   d   f   h   34

because 3 from the d2 dataset is less than 34 (the median for this category in d1) by 31.

Thank you for your help.

CodePudding user response:

You could calculate the median of d1 and then do a right_join on d2:

library(dplyr)

d1 %>% 
  group_by(across(-wer)) %>% 
  summarise(wer = median(wer), .groups = "drop") %>% 
  right_join(d2, by = c("et", "gg", "hj", "ggh"), suffix = c("", ".y")) %>% 
  mutate(wer = ifelse(wer >= wer.y, wer, wer.y), .keep = "unused")

This returns

# A tibble: 1 x 5
  et    gg    hj    ggh     wer
  <chr> <chr> <chr> <chr> <dbl>
1 s     d     f     h        34
  • Related