Home > front end >  How to exclude most dissimilar value of set in R?
How to exclude most dissimilar value of set in R?

Time:05-11

I have a df looking like this but larger:

values <- c(22,16,23,15,14.5,19)
groups <- rep(c("a","b"), each = 3)
df <- data.frame(groups, values)

I have between 1-3 values per group (in the example 3 values for group a and 3 values for group b). I now want to exclude the most dissimilar value from each group. In this example I would want to exclude a 16 and b 19.

Thank you for your help!

CodePudding user response:

If you're looking for one value to discard, you can remove the observation that has the highest distance from the mean value per group:

df %>% 
  group_by(groups) %>% 
  mutate(dist = abs(values - mean(values))) %>% 
  filter(dist != max(dist))

# A tibble: 4 × 3
# Groups:   groups [2]
  groups values  dist
  <chr>   <dbl> <dbl>
1 a        22    1.67
2 a        23    2.67
3 b        15    1.17
4 b        14.5  1.67
  • Related