I have a df looking like this but larger:
values <- c(22,16,23,15,14.5,19)
groups <- rep(c("a","b"), each = 3)
df <- data.frame(groups, values)
I have between 1-3 values per group (in the example 3 values for group a and 3 values for group b). I now want to exclude the most dissimilar value from each group. In this example I would want to exclude a 16 and b 19.
Thank you for your help!
CodePudding user response:
If you're looking for one value to discard, you can remove the observation that has the highest distance from the mean value per group:
df %>%
group_by(groups) %>%
mutate(dist = abs(values - mean(values))) %>%
filter(dist != max(dist))
# A tibble: 4 × 3
# Groups: groups [2]
groups values dist
<chr> <dbl> <dbl>
1 a 22 1.67
2 a 23 2.67
3 b 15 1.17
4 b 14.5 1.67