How to exclude most dissimilar value of set in R?-CodePudding

I have a df looking like this but larger:

values <- c(22,16,23,15,14.5,19)
groups <- rep(c("a","b"), each = 3)
df <- data.frame(groups, values)

I have between 1-3 values per group (in the example 3 values for group a and 3 values for group b). I now want to exclude the most dissimilar value from each group. In this example I would want to exclude a 16 and b 19.

Thank you for your help!

CodePudding user response：

If you're looking for one value to discard, you can remove the observation that has the highest distance from the mean value per group:

df %>% 
  group_by(groups) %>% 
  mutate(dist = abs(values - mean(values))) %>% 
  filter(dist != max(dist))

# A tibble: 4 × 3
# Groups:   groups [2]
  groups values  dist
  <chr>   <dbl> <dbl>
1 a        22    1.67
2 a        23    2.67
3 b        15    1.17
4 b        14.5  1.67