Home > Software engineering >  Error in ggplot2 when using both fill and group parameters in geom_bar
Error in ggplot2 when using both fill and group parameters in geom_bar

Time:11-21

There seems to be a problem with R's ggplot2 library when I include both the fill and group parameters in a bar plot (geom_bar()). I've already tried looking for answers for several hours but couldn't find one that would help. This is actually my first post here.

To give a little background, I have a dataframe named smokement (short for smoke and mental health), a categorical variable named smoke100 (smoked in the past 100 days?) with "Yes" and "No", and another categorical variable named misnervs (frequency of feelings of nervousness) with 5 possible values: "All", "Most", "Some", "A little", and "None."

When I run this code, I get this result:

ggplot(data = smokement)   
geom_bar(aes(x = smoke100, fill = smoke100))   
facet_wrap(~misnervs, nrow = 1)

First code output

However, the result I want is to have all grouped bar plots display their respective proportions. By reading a bit of "R for Data Science" book I found out that I need to include y = ..prop.. and group = 1 in aes() to achieve it:

ggplot(data = smokement)   
geom_bar(aes(x = smoke100, y = ..prop.., group = 1))   
facet_wrap(~misnervs, nrow = 1)

Second code output

Finally, I try to use the fill = smoke100 parameter in aes() to display this categorical variable in color, just like I did on the first code. But when I add this fill parameter, it doesn't work! The code runs, but it shows exactly the same output as the second code, as if the fill parameter this time was somehow ignored!

ggplot(data = smokement)  
geom_bar(aes(x = smoke100, y = ..prop.., group = 1, fill = smoke100))  
facet_wrap(~misnervs, nrow = 1)

Third code output

Does anyone have an idea of why this happens, and how to solve it? My end goal is to display each value of smoke100 (the "Yes" and "No" bars) with colors and a legend at the right, just like on the first graph, while having each grouping level of "misnervs" display their respective proportions of smoke100 ("Yes", "No") levels, just like on the second graph.

EDIT:

> dim(smokement)
[1] 35471     6
> str(smokement)
'data.frame':   35471 obs. of  6 variables:
 $ smoke100: Factor w/ 2 levels "Yes","No": 1 2 1 2 1 1 1 1 1 1 ...
 $ misnervs: Factor w/ 5 levels "All","Most","Some",..: 3 4 5 4 1 5 3 3 5 5 ...
 $ mishopls: Factor w/ 5 levels "All","Most","Some",..: 3 5 5 5 5 5 5 5 5 5 ...
 $ misrstls: Factor w/ 5 levels "All","Most","Some",..: 3 5 5 3 1 5 3 5 1 5 ...
 $ misdeprd: Factor w/ 5 levels "All","Most","Some",..: 5 5 5 5 4 5 5 5 5 5 ...
 $ miswtles: Factor w/ 5 levels "All","Most","Some",..: 5 5 5 5 5 5 5 5 5 5 ...
> head(smokement)
  smoke100 misnervs mishopls misrstls misdeprd miswtles
1      Yes     Some     Some     Some     None     None
2       No A little     None     None     None     None
3      Yes     None     None     None     None     None
4       No A little     None     Some     None     None
5      Yes      All     None      All A little     None
6      Yes     None     None     None     None     None

As for the output without group = 1

ggplot(data = smokement)  
  geom_bar(aes(x = smoke100, y = ..prop.., fill = smoke100))  
  facet_wrap(~misnervs, nrow = 1)

No group code output

CodePudding user response:

Besides the solution offered enter image description here Note the switch from geom_bar to geom_col: geom_bar uses row counts, geom_col uses values in the data.

As a rough-and-ready QC, here's the equivalent of your code that produces the "all grey' plot:

diamonds %>% 
  ggplot()  
    geom_bar(aes(x=color, y=..prop.., fill=color, group=1))  
    facet_wrap(~cut)

enter image description here

  • Related