Home > Net >  Excluding levels/groups within categorical variable (ggplot graph)
Excluding levels/groups within categorical variable (ggplot graph)

Time:05-08

I am relatively new to ggplot, and I am interested in visualizing a categorical variable with 11 groups/levels. I ran the code below to produce a bar graph showing the frequency of each group. However, given that some groups within the categorical variable "active" only occur once or zero times, they clutter the graph. Therefore, is it possible to directly exclude groups in ggplot within the categorical variable with < 2 observations?

I am also open to recommendations on how to visualize a categorical variable with multiple groups/levels if a bar graph isn't suitable here.

Data type

sapply(df,class)
   username      active 
"character" "character" 
ggplot(data = df, aes(x = active))  
  geom_bar() 

enter image description here

CodePudding user response:

You can count() the categories first, and then filter(), before feeding to ggplot. In this way, you would use geom_col() instead:

df %>% count(active) %>% filter(n>2) %>% 
  ggplot(aes(x=active,y=n))   
  geom_col()

Alternatively, you could group_by() / filter() directly within your ggplot() call, like this:

ggplot(df %>% group_by(active) %>% filter(n()>2), aes(x=active))   
  geom_bar()
  • Related