I'm working with a dataframe with the following structure:
ID origin value1 value2
1 A 100 50
1 A 200 100
2 B 10 2
2 B 150 30
So each row can have different origins and I need to make some calculations by ID, but the value variable I'm using depends on the origin variable. So if origin == 'A'
I should use value1
and if it's B I should use value2
. My code without taking this last condition into account looks like this:
df2 <- df %>%
group_by(ID) %>%
mutate(mean_value = mean(value1, na.rm = TRUE),
sd_value = sd(value1, na.rm = TRUE),
median_value = median(value1, na.rm = TRUE),
cv_value = sd_value1/mean_value1,
p25_value = quantile(value1, 0.25, na.rm = TRUE),
p75_value = quantile(value1, 0.75, na.rm = TRUE))
I know I could add an if_else
statement to each line, but I think my code will lose some readability (In my actual data there's multiple origins, which makes this a bit more cumbersome). So, I was thinking of creating a custom function, maybe using map
or maybe something using group_by origin, but I'm not finding a good way to implement these options. Any ideas? My desired dataframe would look like this (I'll add only the first mutate column for simplicity):
ID origin value1 value2 mean_value
1 A 100 50 150
1 A 200 100 150
2 B 10 2 16
2 B 150 30 16
So the first mean value is (100 200) / 2
(from value1) and the second is (30 2) / 2
(from value2).
Thanks!
CodePudding user response:
We could create a temporary column first and then do the mean
afterwards. In this way, we may need to use ifelse/case_when
only once
library(dplyr)
df %>%
mutate(valuenew = case_when(origin == 'A' ~ value1,
TRUE ~ value2)) %>%
group_by(ID) %>%
mutate(mean_value = mean(valuenew, na.rm = TRUE), .keep = "unused") %>%
ungroup
-output
# A tibble: 4 × 5
ID origin value1 value2 mean_value
<int> <chr> <int> <int> <dbl>
1 1 A 100 50 150
2 1 A 200 100 150
3 2 B 10 2 16
4 2 B 150 30 16
data
df <- structure(list(ID = c(1L, 1L, 2L, 2L), origin = c("A", "A", "B",
"B"), value1 = c(100L, 200L, 10L, 150L), value2 = c(50L, 100L,
2L, 30L)), class = "data.frame", row.names = c(NA, -4L))