Home > Back-end >  how to auto modify interval factor level for better display
how to auto modify interval factor level for better display

Time:05-16

Suppose you have data that looks something like this

df <- data.frame(income = rnorm(1000,77345,30569))

You add a column to indicate the quartile interval factor that each observation falls under

df$quant <- cut(df$income, quantile(df$income))

The factor levels look something like this

Levels: (-4.48e 04,5.6e 04] (5.6e 04,7.69e 04] (7.69e 04,9.73e 04] (9.73e 04,1.64e 05]

How can you programmatically, not manually, change the intervals so they print out nicely in a frequency summary table?

df %>% count(quant)

Which prints like this:

               quant   n
1 (-4.48e 04,5.6e 04] 249
2  (5.6e 04,7.69e 04] 250
3 (7.69e 04,9.73e 04] 250
4 (9.73e 04,1.64e 05] 250

I want it to look something like this

              quant   n
1  ($44,800,$56,000] 249
2  ($56,000,$76,900] 250
3  ($76,900,$97,300] 250
4 ($97,300,$164,000] 250

This is just for printing purposes (in a Rmarkdown report). I have made all calculations and plotting without a problem already.

CodePudding user response:

cut2 can take a formatfun argument

library(Hmisc)
library(scales)
df$quant2 <-  cut2(df$income,digits = 5, cuts = quantile(df$income), 
   formatfun = function(x) paste0("$", comma(x)), onlycuts = TRUE)

-output

> head(df)
     income             quant2               quant
1  60657.97  [$55,485,$76,547) (5.55e 04,7.65e 04]
2  93747.88  [$76,547,$96,620) (7.65e 04,9.66e 04]
3  90172.46  [$76,547,$96,620) (7.65e 04,9.66e 04]
4  59504.10  [$55,485,$76,547) (5.55e 04,7.65e 04]
5 103251.01 [$96,620,$178,251] (9.66e 04,1.78e 05]
6  85477.03  [$76,547,$96,620) (7.65e 04,9.66e 04]

If we want to modify the original cut column

library(tidyr)
library(stringr)
df <- df %>%
     mutate(quant = str_remove_all(quant, "\\(|\\]")) %>% 
     separate(quant, into = c('q1', 'q2'), sep=",", convert = TRUE) %>% 
     mutate(across(q1:q2, ~ dollar(.x)), 
     quant = glue::glue("({q1},{q2}]"), q1 = NULL, q2 = NULL)

-output

> head(df)
     income              quant
1  60657.97  ($55,500,$76,500]
2  93747.88  ($76,500,$96,600]
3  90172.46  ($76,500,$96,600]
4  59504.10  ($55,500,$76,500]
5 103251.01 ($96,600,$178,000]
6  85477.03  ($76,500,$96,600]
  • Related