Home > Blockchain >  Excluding categories less than x from a barplot in R
Excluding categories less than x from a barplot in R

Time:11-15

I am trying to create a barplot out of a categorical variable, but there are an overwhelming amount of categories, and a third of them have insignificant frequencies, crowding the plot and making it impossible to interpret.

I want to figure out a way to exclude these rare categories from the table I create, so it only includes the categories with say 10 or more instances included.

How I made my plot:

x <- table(Data$Variable)

barplot(x)

I tried some other suggestions on excluding categories in a variable, but I am new to R and programming in general so I don't really understand the nuance.

CodePudding user response:

Suppose your data is like this:

set.seed(2022)

Data <- data.frame(variable = sample(LETTERS[1:10], 100, TRUE))

x <- table(Data$variable)

x
#> 
#>  A  B  C  D  E  F  G  H  I  J 
#> 12 11 13  8  8  7 12 13  6 10

Then your plot will be something like this:

barplot(x)

If we want only the bars above 10 in our plot we can do:

barplot(x[x > 10])

And if we only want the five highest bars (in order of size), we can do:

barplot(rev(sort(x))[1:5])

Created on 2022-11-14 with reprex v2.0.2

  • Related