I am trying to create a barplot out of a categorical variable, but there are an overwhelming amount of categories, and a third of them have insignificant frequencies, crowding the plot and making it impossible to interpret.
I want to figure out a way to exclude these rare categories from the table I create, so it only includes the categories with say 10 or more instances included.
How I made my plot:
x <- table(Data$Variable)
barplot(x)
I tried some other suggestions on excluding categories in a variable, but I am new to R and programming in general so I don't really understand the nuance.
CodePudding user response:
Suppose your data is like this:
set.seed(2022)
Data <- data.frame(variable = sample(LETTERS[1:10], 100, TRUE))
x <- table(Data$variable)
x
#>
#> A B C D E F G H I J
#> 12 11 13 8 8 7 12 13 6 10
Then your plot will be something like this:
barplot(x)
If we want only the bars above 10 in our plot we can do:
barplot(x[x > 10])
And if we only want the five highest bars (in order of size), we can do:
barplot(rev(sort(x))[1:5])
Created on 2022-11-14 with reprex v2.0.2