I am looking for a way to take an ordered vector and return the percentage of the way through the vector that each value appears for the first time.
See below for the input vector and the expected result.
InputVector<-c(1,1,1,1,1,2,2,2,3,3)
ExpectedResult<-data.frame(Value=c(1,2,3), Percentile=c(0,0.5,0.8))
In this case, 1 appears at the 0th percentile, 2 at the 50th and 3 at the 80th.
CodePudding user response:
In base R, with rle
and cumsum
:
p <- with(rle(InputVector), cumsum(lengths) / sum(lengths))
c(0, p[-length(p)])
#[1] 0.0 0.5 0.8
CodePudding user response:
Using rank()
and unique()
:
data.frame(
Value = InputVector,
Percentile = (rank(InputVector, ties.method = "min") - 1) / length(InputVector)
) |>
unique()
#> Value Percentile
#> 1 1 0.0
#> 6 2 0.5
#> 9 4 0.8
You could also use dplyr::percent_rank()
, but note it computes percentiles differently:
library(dplyr)
tibble(
Value = InputVector,
Percentile = percent_rank(Value)
) %>%
distinct()
#> # A tibble: 3 × 2
#> Value Percentile
#> <dbl> <dbl>
#> 1 1 0
#> 2 2 0.556
#> 3 4 0.889
Created on 2022-11-09 with reprex v2.0.2
CodePudding user response:
Use match
in base R
(match(unique(InputVector), InputVector)-1)/length(InputVector)
[1] 0.0 0.5 0.8