Home > Enterprise >  How to I get wide data tidied into the proper format for a stacked bar facet_grid plot?
How to I get wide data tidied into the proper format for a stacked bar facet_grid plot?

Time:03-11

I'm trying to get data into a 3x3 facet_grid plot, but am struggling to find the right tidy combination to make it work.

I can manage to get a single category to facet like so:

# ingest some data
df <- structure(list(Q52_bin = structure(c(3L, 2L, 2L, 2L, 2L, 2L), .Label = c("low", 
"medium", "high"), class = "factor"), Q53_bin = structure(c(2L, 
3L, 2L, 2L, 2L, 2L), .Label = c("low", "medium", "high"), class = "factor"), 
    Q57_bin = structure(c(2L, 2L, 2L, 2L, 2L, 2L), .Label = c("low", 
    "medium", "high"), class = "factor"), Q4 = c("A little", 
    "Some", "Some", "A great deal", "A lot", "Some")), row.names = c(NA, 
-6L), class = c("tbl_df", "tbl", "data.frame"))
# Now let's try and develop a faceted plot using the low/med/high bins we've created above under political_lr, spirituality etc.

# make column names coherent and simplified
names(df) <- c("Q52_bin", "Q53_bin", "Q57_bin", "response")
# filter out NA values
df <- filter(df, !is.na(response))
# generate new dataframe with sums per category and sort in descending order
df <- df %>%
  dplyr::count(response, Q52_bin, sort = TRUE) %>% 
  dplyr::mutate(response = forcats::fct_rev(forcats::fct_inorder(response))) 
# make plot
ggplot(df, aes(x = n, y = response))  
  geom_col(colour = "white")   facet_grid(rows = vars(Q52_bin))  
  ## reduce spacing between labels and bars
  scale_x_continuous(expand = c(.01, .01))  
  scale_fill_identity(guide = "none")  
  ## get rid of all elements except y axis labels   adjust plot margin
  theme_ipsum_rc()  
  theme(plot.margin = margin(rep(15, 4)))  
  easy_center_title()

I've filtered out columns using count() aside from Q52_bin. In order to get the proper set, I believe I'll need to use pivot_longer(), in somethign like this:

# Now let's try and add in rows to represent other kinds of faceting in a 3x3 visualisation
df <- select(climate_experience_data_named, Q52_bin, Q53_bin, Q57_bin, Q4)
# make column names coherent and simplified
names(df) <- c("Q52_bin", "Q53_bin", "Q57_bin", "response")
# filter out NA values
df <- filter(df, !is.na(response))
# generate new dataframe with sums per category and sort in descending order
# commenting out percentages and labelling in plot as this will need to be handled differently in facets
# for additional faceted columns to work, we need to convert this data to long format so that bin data is integrated into counts
df <- df %>%
  pivot_longer(!response, names_to = "bin_name", values_to = "b")
df <- df %>%
  dplyr::count(response, bin_name, sort = TRUE)

# Broken plot!
ggplot(df, aes(x = bin_name, y = n))  
  geom_col(colour = "white", stat='identity')   facet_grid(rows = vars(?), cols = vars(bin_name))

The goal is to have facet rows as shown above at "low" "medium" and "high" with columns following "Q52_bin", "Q53_bin", and "Q57_bin" and stacked bars inside of each for the likert style factors this data comes as. And I believe that this is breaking at the point I'm using count() here. But I can't seem to sort out how to reconfigure. And the plot, obviously, isn't getting anywhere either. I suspect this is just an easy tweak, but seems to be beyond me!

CodePudding user response:

I'm not sure I've exactly understood your final desired plot, but I think from your original df you can do this:

names(df) <- c("Q52_bin", "Q53_bin", "Q57_bin", "response")
df %>% 
  pivot_longer(!response, names_to = "bin_name", values_to = "b") %>% 
  count(response, bin_name, b) %>% 
  ggplot(aes(x=n,y=response))  
  geom_col(color='white')   
  facet_grid(vars(bin_name), vars(b))

Output:

likert bar plot

  • Related