Home > Back-end >  How cut a continuous skewed variable to exteremly high to extremly low categories?
How cut a continuous skewed variable to exteremly high to extremly low categories?

Time:12-02

I have a continuous varaible in my dataset with such distribution:

summary(emissions$NMVOC_gram)
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
       0      256      547    15802     1074 50818630 

how can i categorize this variable to unequal levels of extremly high to extremly low, low, high and medium in R or excel? i add what i should have as picture,

thank you for the help enter image description here

I tried cut function in r but the result was not what i expected,actuallly i do not know how should i define the breaks, in my data the 3rd Qu. is lower than the Mean.

CodePudding user response:

Presuming you want to cut data into quintiles (5 categories). Have included only count data than percentages.

library(tidyverse)
xs=quantile(iris$Sepal.Length,c(0,1/5,2/5,3/5,4/5,1))
xs2<-as.data.frame(xs)
iris <- iris %>%
  mutate(Sepal_legth_cat = cut(Sepal.Length, breaks=xs, labels=c(paste0("ext low"),
                                                                      paste0("low"),
                                                                      paste0("med"),
                                                                      paste0("high"),
                                                                      paste0("ext high"))))

ggplot(iris,aes(Sepal_legth_cat)) 
  geom_bar() 
  coord_flip()


  • Related