I am trying to assign quantile groups for a stacked data such that for each category (r1 and r2 in my example) of data, I can classify the values into 5 groups. I can manage to do this using ntile() as follows.
r1<-rnorm(10,0,1)
r2<-rnorm(10,2,4)
df<-cbind(r1,r2)
df<-melt(df)
df<-df%>%group_by(Var2) %>% mutate(group=ntile(value,5))
However, what should I do if I hope to exclude the top and bottom 10% when sorting the groups. Ideally, I hope to keep those top and bottom values in the output table with their group code showing as "NA".
Thanks to anyone who can help!
CodePudding user response:
Your question is a little ambiguous. It is not clear whether you wish to exclude the top and bottom 10% from the quintile calculation (so that you are getting equal quintiles of the 10-90th centiles of the original data), or whether you want to do the quintiles first on all the data, then exclude the first and last 10%. Doing it the second way will give you smaller 1st and 5th quintiles, so I assume you mean the first method:
df %>%
group_by(Var2) %>%
mutate(group = ntile(value, 10)) %>%
mutate(group = ntile(ifelse(group %% 9 == 1, NA, value), 5))
#> # A tibble: 20 x 4
#> # Groups: Var2 [2]
#> Var1 Var2 value group
#> <int> <fct> <dbl> <int>
#> 1 1 r1 -0.626 1
#> 2 2 r1 0.184 2
#> 3 3 r1 -0.836 NA
#> 4 4 r1 1.60 NA
#> 5 5 r1 0.330 3
#> 6 6 r1 -0.820 1
#> 7 7 r1 0.487 3
#> 8 8 r1 0.738 5
#> 9 9 r1 0.576 4
#> 10 10 r1 -0.305 2
#> 11 1 r2 8.05 NA
#> 12 2 r2 3.56 2
#> 13 3 r2 -0.485 1
#> 14 4 r2 -6.86 NA
#> 15 5 r2 6.50 5
#> 16 6 r2 1.82 1
#> 17 7 r2 1.94 2
#> 18 8 r2 5.78 4
#> 19 9 r2 5.28 3
#> 20 10 r2 4.38 3
Just in case, the second method you would achieve like this:
df %>%
group_by(Var2) %>%
mutate(group = ntile(value, 5)) %>%
mutate(group = ifelse(ntile(value, 10) %% 9 == 1, NA, group))
#> # A tibble: 20 x 4
#> # Groups: Var2 [2]
#> Var1 Var2 value group
#> <int> <fct> <dbl> <int>
#> 1 1 r1 -0.626 2
#> 2 2 r1 0.184 3
#> 3 3 r1 -0.836 NA
#> 4 4 r1 1.60 NA
#> 5 5 r1 0.330 3
#> 6 6 r1 -0.820 1
#> 7 7 r1 0.487 4
#> 8 8 r1 0.738 5
#> 9 9 r1 0.576 4
#> 10 10 r1 -0.305 2
#> 11 1 r2 8.05 NA
#> 12 2 r2 3.56 3
#> 13 3 r2 -0.485 1
#> 14 4 r2 -6.86 NA
#> 15 5 r2 6.50 5
#> 16 6 r2 1.82 2
#> 17 7 r2 1.94 2
#> 18 8 r2 5.78 4
#> 19 9 r2 5.28 4
#> 20 10 r2 4.38 3
Created on 2022-02-19 by the reprex package (v2.0.1)
Setup and data used
library(dplyr)
library(reshape2)
set.seed(1)
r1 <- rnorm(10,0,1)
r2 <- rnorm(10,2,4)
df <- cbind(r1,r2)
df <- melt(df)