Home > other >  How to replace values which are not in the top 3 values by group (R)
How to replace values which are not in the top 3 values by group (R)

Time:11-18

I am trying to set to "Other" the names which are not in the top 3 values by group (and the top 3 will be "TOP"). I tried this, and i really don't know why it's not working...

x <- data.frame(
Groupe=c(rep("a", 10), rep("b", 10)),
Value=c(runif(20)*20),
Name=c("aa","bb","cc","dd","ee",
       "ff","zz","yy","oo","uu")
)

f <- x %>%
  group_by(Groupe) %>%  
  mutate(test = ifelse(Name %in% slice_max(., order_by=Value, n=3)$Name, "TOP", "Other")) %>% 
  ungroup()

CodePudding user response:

It took me a minute to figure out why your code doesn't work: it's because slice_max(., order_by=Value, n=3)$Name returns all the top Names in every group, so the group_by doesn't quite work right because there's nothing to associate it with the current group.

Instead, let's use rank() which works on a vector rather than a whole data frame and will thus work nicely inside the grouped operation:

x %>%
  group_by(Groupe) %>%  
  mutate(test = ifelse(Name %in% Name[rank(-Value) <= 3], "TOP", "Other")) %>% 
  ungroup()
# # A tibble: 20 × 4
#    Groupe  Value Name  test 
#    <chr>   <dbl> <chr> <chr>
#  1 a       5.34  aa    Other
#  2 a       7.72  bb    Other
#  3 a       0.268 cc    Other
#  4 a       7.65  dd    Other
#  5 a      17.4   ee    TOP  
#  6 a       6.81  ff    Other
#  7 a       9.64  zz    Other
#  8 a      12.0   yy    TOP  
#  9 a       9.87  oo    TOP  
# 10 a       3.72  uu    Other
# 11 b      16.5   aa    TOP  
# 12 b      13.4   bb    Other
# 13 b      15.9   cc    TOP  
# 14 b       2.16  dd    Other
# 15 b      14.5   ee    Other
# 16 b       8.23  ff    Other
# 17 b      16.4   zz    TOP  
# 18 b      12.9   yy    Other
# 19 b      15.7   oo    Other
# 20 b      11.1   uu    Other

You may want to check out the ?rank help page and make sure ties are treated as you like.

CodePudding user response:

You could try something like this.

Where you first sort the Values by group and then use an if statement to assign if the values exist in the top 3 rows.

x %>%
  group_by(Groupe) %>%
  arrange(desc(Value), .by_group = T) %>%
  # mutate(test = ifelse(Value %in% head(Value,3), "TOP", "Other")) %>% 
  mutate(test = ifelse(row_number() <= 3, "TOP", "Other")) %>% 
  ungroup()
# A tibble: 20 × 4
   Groupe  Value Name  test 
   <fct>   <dbl> <fct> <chr>
 1 a      19.6   ee    TOP  
 2 a      18.7   cc    TOP  
 3 a      18.6   yy    TOP  
 4 a      17.8   ff    Other
 5 a      15.9   dd    Other
 6 a       8.84  bb    Other
 7 a       5.79  zz    Other
 8 a       5.78  oo    Other
 9 a       5.38  aa    Other
10 a       1.78  uu    Other
11 b      18.8   dd    TOP  
12 b      11.7   uu    TOP  
13 b      11.7   oo    TOP  
14 b       9.68  bb    Other
15 b       7.17  aa    Other
16 b       2.96  ee    Other
17 b       2.22  cc    Other
18 b       1.91  yy    Other
19 b       1.62  ff    Other
20 b       0.925 zz    Other
  •  Tags:  
  • r
  • Related