Home > database >  Clean way to select variable for calculations depending on other variable value in R
Clean way to select variable for calculations depending on other variable value in R

Time:03-29

I'm working with a dataframe with the following structure:

ID     origin    value1    value2
1        A         100       50
1        A         200       100
2        B         10        2
2        B         150       30

So each row can have different origins and I need to make some calculations by ID, but the value variable I'm using depends on the origin variable. So if origin == 'A' I should use value1 and if it's B I should use value2. My code without taking this last condition into account looks like this:

df2 <- df %>% 
  group_by(ID) %>% 
  mutate(mean_value = mean(value1, na.rm = TRUE),
         sd_value = sd(value1, na.rm = TRUE),
         median_value = median(value1, na.rm = TRUE),
         cv_value = sd_value1/mean_value1,
         p25_value = quantile(value1, 0.25, na.rm = TRUE),
         p75_value = quantile(value1, 0.75, na.rm = TRUE)) 

I know I could add an if_else statement to each line, but I think my code will lose some readability (In my actual data there's multiple origins, which makes this a bit more cumbersome). So, I was thinking of creating a custom function, maybe using map or maybe something using group_by origin, but I'm not finding a good way to implement these options. Any ideas? My desired dataframe would look like this (I'll add only the first mutate column for simplicity):

ID     origin    value1    value2 mean_value 
1        A         100       50      150
1        A         200       100     150
2        B         10        2       16
2        B         150       30      16

So the first mean value is (100 200) / 2 (from value1) and the second is (30 2) / 2 (from value2).

Thanks!

CodePudding user response:

We could create a temporary column first and then do the mean afterwards. In this way, we may need to use ifelse/case_when only once

library(dplyr)
df %>%
   mutate(valuenew = case_when(origin == 'A' ~ value1, 
    TRUE ~ value2)) %>% 
   group_by(ID) %>%
   mutate(mean_value = mean(valuenew, na.rm = TRUE), .keep = "unused") %>%
   ungroup

-output

# A tibble: 4 × 5
     ID origin value1 value2 mean_value
  <int> <chr>   <int>  <int>      <dbl>
1     1 A         100     50        150
2     1 A         200    100        150
3     2 B          10      2         16
4     2 B         150     30         16

data

df <- structure(list(ID = c(1L, 1L, 2L, 2L), origin = c("A", "A", "B", 
"B"), value1 = c(100L, 200L, 10L, 150L), value2 = c(50L, 100L, 
2L, 30L)), class = "data.frame", row.names = c(NA, -4L))
  • Related