I have a dataset that looks like the following:
INCOME | WEALTH |
---|---|
10.000 | 100000 |
15.000 | 111000 |
14.200 | 123456 |
12.654 | 654321 |
I have many more rows.
I now want to now find how much INCOME a household in a specific WEALTH percentile has. The following quantiles are relevant:
c(0.01,0.05,0.1,0.25,0.5,0.75,0.9,0.95,0.99)
I have always used the following code to get specific percentile values:
a <- quantile(WEALTH, probs = c(0.01,0.05,0.1,0.25,0.5,0.75,0.9,0.95,0.99))
But now I want to base my percentiles on WEALTH but get the respective INCOME. I have tried the following code but the results are not plausible:
df$percentile = ntile(df$WEALTH,100)
df <- df[df$percentile %in% c(1,5,10,25,50,75,90,95,99), ]
a <- df %>%
group_by(percentile) %>%
summarise(max = max(INCOME))
The results that I get a not consistent with other parts of the analysis that I have done. I assume that the percentile when using the "quantile" function are calculated differently that simply taking the maximum.
CodePudding user response:
Im not sure if i understood your question correctly, but the quantile has different methods of calculation. I for example always go for number 6, since this is what i was taought in my stat courses.
type: an integer between 1 and 9 selecting one of the nine quantile algorithms detailed below to be used.
Read more about different types by using ?quantile
commands (help on quantile)
CodePudding user response:
If you have fewer than 100 rows in your dataset, dplyr::ntile(x, 100)
won’t yield accurate percentiles, but will only give you bins numbered through the total number of rows:
library(dplyr)
df %>%
mutate(percentile = ntile(WEALTH, 100))
# A tibble: 4 × 3
INCOME WEALTH percentile
<dbl> <dbl> <int>
1 10 100000 1
2 15 111000 2
3 14.2 123456 3
4 12.7 654321 4
To get true percentiles, you can rescale the result, manually or with scales::rescale()
:
library(scales)
df %>%
mutate(percentile = rescale(
ntile(WEALTH, 100),
c(1, 100)
))
# A tibble: 4 × 3
INCOME WEALTH percentile
<dbl> <dbl> <dbl>
1 10 100000 1
2 15 111000 34
3 14.2 123456 67
4 12.7 654321 100