I have a continuous varaible in my dataset with such distribution:
summary(emissions$NMVOC_gram)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0 256 547 15802 1074 50818630
how can i categorize this variable to unequal levels of extremly high to extremly low, low, high and medium in R or excel? i add what i should have as picture,
thank you for the help enter image description here
I tried cut function in r but the result was not what i expected,actuallly i do not know how should i define the breaks, in my data the 3rd Qu. is lower than the Mean.
CodePudding user response:
Presuming you want to cut data into quintiles (5 categories). Have included only count data than percentages.
library(tidyverse)
xs=quantile(iris$Sepal.Length,c(0,1/5,2/5,3/5,4/5,1))
xs2<-as.data.frame(xs)
iris <- iris %>%
mutate(Sepal_legth_cat = cut(Sepal.Length, breaks=xs, labels=c(paste0("ext low"),
paste0("low"),
paste0("med"),
paste0("high"),
paste0("ext high"))))
ggplot(iris,aes(Sepal_legth_cat))
geom_bar()
coord_flip()