Home > front end >  R: Replacing Ranges with their Percentiles
R: Replacing Ranges with their Percentiles

Time:12-28

I am working with the R programming language. Suppose I have the following data frame:

var_1 = rnorm(100,10,10)
var_2 = rnorm(100,10,10)
var_3 = rnorm(100,10,10)

d = data.frame(var_1, var_2, var_3)

head(d)


      var_1     var_2      var_3
1 14.251923 14.877801  22.636207
2  7.325137  8.513718  21.021522
3  3.400001 -3.400397  11.274797
4 16.400597  8.623980   9.366115
5  7.065583 13.155570  17.891432
6 21.297912  4.341385 -11.337330

My Question: For each element in each variable, I want to replace the element with the percentile it belongs to.

For example:

a = quantile(d$var_1, c( 0.15, 0.3, 0.35, 0.45, 0.5, 0.65, 0.7 0.8, 0.85, 0.9, 0.95, 1))

b = quantile(d$var_2, c(0.16, 0.23, 0.65, 0.71, 0.95))

c = quantile(d$var_3, c(0.15, 0.28, 0.7, 0.73, 0.87))


> a
        5%        10%        15%        20%        25%        30%        35%        40%        45%        50%        55%        60%        65%        70%        75% 
-0.8806901  0.3595086  1.1201300  3.0581928  5.0901641  7.0056228  7.6089831  8.9853805  9.9264540 10.2235212 11.5707533 13.2422940 15.1076889 16.5354881 17.9336020 
       80%        85%        90%        95%       100% 
19.5312682 21.9264905 24.4511364 26.6820271 41.4419744 

> b
      16%       23%       65%       71%       95% 
-2.795294  1.430715 11.070815 12.688064 25.270823 

> c
      15%       28%       70%       73%       87% 
 0.958404  5.767591 15.258532 16.013648 20.467892 

For example:

  • if d$var_2 < -2.795294 , then d$var_2 = 16th percentile
  • if d$var_3 between (5.767591 , 15.258532), then d$var_3 = 70th percentile

I can write multiple "if statements" do this manually, but is there a faster way to do this?

Thanks!

CodePudding user response:

You could do something like this by applying a custom function:

library(tidyverse)

ApplyQuantiles <- function(x, y) {
  cut(
    x,
    breaks = c(quantile(x, probs = y)),
    labels = c(names(quantile(x, probs = y))[-1]),
    include.lowest = TRUE
  )
}

output <- d %>% 
  mutate(var_1 = ApplyQuantiles(var_1, c(0, 0.15, 0.3, 0.35, 0.45, 0.5, 0.65, 0.7, 0.8, 0.85, 0.9, 0.95, 1)),
         var_2 = ApplyQuantiles(var_2, c(0, 0.16, 0.23, 0.65, 0.71, 0.95, 1.0)),
         var_3 = ApplyQuantiles(var_3, c(0, 0.15, 0.28, 0.7, 0.73, 0.87, 1.0))) %>% 
  mutate(across(everything(), str_replace, pattern = "%", replacement = "th percentile"))

Output

head(output, 10)

               var_1            var_2            var_3
1    45th percentile  95th percentile  87th percentile
2    35th percentile 100th percentile  70th percentile
3    70th percentile  95th percentile  70th percentile
4    80th percentile  65th percentile  70th percentile
5    30th percentile  16th percentile  28th percentile
6    15th percentile  95th percentile  28th percentile
7    30th percentile  16th percentile  15th percentile
8    45th percentile  16th percentile  70th percentile
9    65th percentile  95th percentile  70th percentile
10   45th percentile  65th percentile  70th percentile
  • Related