Home > Software engineering >  Removing NA category from grouped bar charts
Removing NA category from grouped bar charts

Time:01-28

I am currently working with survey data with 250 columns. A sample of my data looks like this:

q1 <- factor(c("yes",NA,"no","yes",NA,"yes","no","yes"))
q2 <- factor(c("Albania","USA","Albania","Albania","UK",NA,"UK","Albania"))
q3 <- factor(c(0,1,NA,0,1,1,NA,0))
q4 <- factor(c(0,NA,NA,NA,1,NA,0,0))
q5 <- factor(c("Dont know","Prefer not to answer","Agree","Disagree",NA,"Agree","Agree",NA))
q6 <- factor(c(1,NA,3,5,800,NA,900,2))
sector <- factor(c("Energy","Water","Energy","Other","Other","Water","Transportation","Energy"))

data <- data.frame(q1,q2,q3,q4,q5,q6,sector)

I have created a function to loop through all 250 columns and create grouped bar charts where x axis shows sectors, y axis shows percentage distribution of answers and fill is the underlying column from data. Below you can see the code for the function:

by_sector <- lapply(names(data), function(variable) {
  ggplot(
    data = data,
    mapping = aes(x=sector,fill = data[[variable]])
  )  
    geom_bar(aes( y=..count../tapply(..count.., ..x.. ,sum)[..x..]), position="dodge")  
    labs(x = variable, y = "% of total", fill = "Response", caption = paste("Total =", sum(!is.na(data[[variable]]))))  
    geom_text(aes( y=..count../tapply(..count.., ..x.. ,sum)[..x..], label=scales::percent(..count../tapply(..count.., ..x.. ,sum)[..x..],accuracy = 0.1) ),
              stat="count", position=position_dodge(1), vjust=0.5) 
    #scale_fill_brewer(palette = "Accent") 
    scale_fill_discrete(na.translate = FALSE)  
    theme_bw()  
    theme(panel.grid.major.y = element_blank())  
    coord_flip()
  
})

As you can see from image below, since I use data columns as fill, there is transparent NA category showing up. I want to remove that category from grouped bars. enter image description here

I tried couple of things: scale_fill_discrete(na.translate = FALSE) This just removed NA from legend not from grouped bars. fill = subset(data,!is.na(data[[variable]])) This didn't work ggplot(data=na.omit(data[[variable]])) This didn't work neither.

Is there a way to modify my code for barplots so that NA category doesn't show up as a bar in the graph? Thank you very much beforehand!

CodePudding user response:

One option would be to aggregate your data outside of ggplot() which makes it easier to debug, removes the duplicated computations inside the code and makes it easy to drop the NA categories if desired.

Additionally, I moved the plotting code to a separate function which also allows for easier debugging by e.g. running the code for just one example.

Finally note, that I switched to the .data pronoun as the recommend way to use column names passed as strings.

Showing only the plots for two of the problematic columns:

EDIT Fixed a small bug by removing the NA values before aggregating instead of doing that afterwards.

library(ggplot2)
library(dplyr, warn.conflicts = FALSE)

plot_fun <- function(variable) {
  total <- sum(!is.na(data[[variable]]))
  
  data <- data |> 
    filter(!is.na(.data[[variable]])) |> 
    group_by(across(all_of(c("sector", variable)))) |> 
    summarise(n = n(), .groups = "drop_last") |> 
    mutate(pct = n / sum(n)) |> 
    ungroup()
  
  ggplot(
    data = data,
    mapping = aes(x = sector, y = pct, fill = .data[[variable]])
  )  
    geom_col(position = "dodge")  
    labs(
      x = variable, y = "% of total", fill = "Response",
      caption = paste("Total =", total)
    )  
    geom_text(
      aes(
        label = scales::percent(pct, accuracy = 0.1)
      ),
      position = position_dodge(.9), vjust = 0.5
    )  
    scale_fill_brewer(palette = "Accent")  
    theme_bw()  
    theme(panel.grid.major.y = element_blank())  
    coord_flip()
}

by_sector <- lapply(names(data), plot_fun)


by_sector[c(3, 6)]
#> [[1]]

#> 
#> [[2]]

  • Related