Suppose you have data that looks something like this
df <- data.frame(income = rnorm(1000,77345,30569))
You add a column to indicate the quartile interval factor that each observation falls under
df$quant <- cut(df$income, quantile(df$income))
The factor levels look something like this
Levels: (-4.48e 04,5.6e 04] (5.6e 04,7.69e 04] (7.69e 04,9.73e 04] (9.73e 04,1.64e 05]
How can you programmatically, not manually, change the intervals so they print out nicely in a frequency summary table?
df %>% count(quant)
Which prints like this:
quant n
1 (-4.48e 04,5.6e 04] 249
2 (5.6e 04,7.69e 04] 250
3 (7.69e 04,9.73e 04] 250
4 (9.73e 04,1.64e 05] 250
I want it to look something like this
quant n
1 ($44,800,$56,000] 249
2 ($56,000,$76,900] 250
3 ($76,900,$97,300] 250
4 ($97,300,$164,000] 250
This is just for printing purposes (in a Rmarkdown report). I have made all calculations and plotting without a problem already.
CodePudding user response:
cut2
can take a formatfun
argument
library(Hmisc)
library(scales)
df$quant2 <- cut2(df$income,digits = 5, cuts = quantile(df$income),
formatfun = function(x) paste0("$", comma(x)), onlycuts = TRUE)
-output
> head(df)
income quant2 quant
1 60657.97 [$55,485,$76,547) (5.55e 04,7.65e 04]
2 93747.88 [$76,547,$96,620) (7.65e 04,9.66e 04]
3 90172.46 [$76,547,$96,620) (7.65e 04,9.66e 04]
4 59504.10 [$55,485,$76,547) (5.55e 04,7.65e 04]
5 103251.01 [$96,620,$178,251] (9.66e 04,1.78e 05]
6 85477.03 [$76,547,$96,620) (7.65e 04,9.66e 04]
If we want to modify the original cut
column
library(tidyr)
library(stringr)
df <- df %>%
mutate(quant = str_remove_all(quant, "\\(|\\]")) %>%
separate(quant, into = c('q1', 'q2'), sep=",", convert = TRUE) %>%
mutate(across(q1:q2, ~ dollar(.x)),
quant = glue::glue("({q1},{q2}]"), q1 = NULL, q2 = NULL)
-output
> head(df)
income quant
1 60657.97 ($55,500,$76,500]
2 93747.88 ($76,500,$96,600]
3 90172.46 ($76,500,$96,600]
4 59504.10 ($55,500,$76,500]
5 103251.01 ($96,600,$178,000]
6 85477.03 ($76,500,$96,600]