Home > database >  How to make a bar-chart by using two variables on x-axis and a grouped variable on y-axis?
How to make a bar-chart by using two variables on x-axis and a grouped variable on y-axis?

Time:04-27

I hope I asked my question in the right way this time! If not let me know! I want to code a grouped bar-chart similary to this one (I just created in paint): enter image description here I created as flipped both it actually doesn't matter if its flipped or not. So, a plot similarly to this will also be very usefull: Grouped barchart in r with 4 variables

Both the variables, happy and lifesatisfied are scaled values from 0 to 10. Working hours is a grouped value and contains 43 , 37-42, 33-36, 27-32, and <27.

A very similar example of how my data set looks like (I just changed the values and order, I also have much more observations):

Working hours happy lifestatisfied contry
37-42 7 9 DK
<27 8 8 SE
43 7 8 DK
33-36 6 6 SE
37-42 7 5 NO
<27 4 7 NO

I tried to found similar examples and based on that tried to code the bar chart in the following way but it doesn't work:

df2 <- datafilteredwomen %>% 
  pivot_longer(cols = c("happy", "stflife"), names_to = "var", values_to = "Percentage")

ggplot(df2)  
  geom_bar(aes(x = Percentage, y = workinghours, fill = var ), stat = "identity", position = "dodge")   theme_minimal()

It give this plot which is not correct/what I want: enter image description here

seocnd try:

forplot = datafilteredwomen %>% group_by(workinghours, happy, stflife) %>% summarise(count = n()) %>% mutate(proportion = count/sum(count))

ggplot(forplot, aes(workinghours, proportion, fill = as.factor(happy)))   
  geom_bar(position = "dodge", stat = "identity", color = "black") 

gives this plot: enter image description here

third try - used the ggplot2 builder add-in:

library(dplyr)
library(ggplot2)

datafilteredwomen %>%
 filter(!is.na(workinghours)) %>%
 ggplot()  
 aes(x = workinghours, group = happy, weight = happy)  
 geom_bar(position = "dodge", 
 fill = "#112446")  
 theme_classic()   scale_y_continuous(labels = scales::percent)

gives this plot: enter image description here

But none of my tries are what I want.. really hope that someone can help me if it's possible!

CodePudding user response:

After speaking to the OP I found his data source and came up with this solution. Apologies if it's a bit messy, I have only been using R for 6 months. For ease of reproducibility I have preselected the variables used from the original dataset.

data <- structure(list(wkhtot = c(40, 8, 50, 40, 40, 50, 39, 48, 45, 
16, 45, 45, 52, 45, 50, 37, 50, 7, 37, 36), happy = c(7, 8, 10, 
10, 7, 7, 7, 6, 8, 10, 8, 10, 9, 6, 9, 9, 8, 8, 9, 7), stflife = c(8, 
8, 10, 10, 7, 7, 8, 6, 8, 10, 9, 10, 9, 5, 9, 9, 8, 8, 7, 7)), row.names = c(NA, 
-20L), class = c("tbl_df", "tbl", "data.frame"))

Here are the packages required.

require(dplyr)
require(ggplot2)
require(tidyverse)
    

Here I have manipulated the data and commented my reasoning.

data <- data %>%
  select(wkhtot, happy, stflife) %>% #Select the wanted variables
  rename(Happy = happy) %>% #Rename for graphical sake
  rename("Life Satisfied" = stflife) %>%
  na.omit() %>% # remove NA values
  group_by(WorkingHours = cut(wkhtot, c(-Inf, 27, 32,36,42,Inf))) %>% #Create the ranges
  select(WorkingHours, Happy, "Life Satisfied") %>% #Select the variables again
  pivot_longer(cols = c(`Happy`, `Life Satisfied`), names_to = "Criterion", values_to = "score") %>% # pivot the df longer for plotting
  group_by(WorkingHours, Criterion)

data$Criterion <- as.factor(data$Criterion) #Make criterion a factor for graphical reasons

A bit more data prep

# Creating the percentage
data.plot <- data %>%
  group_by(WorkingHours, Criterion) %>%
  summarise_all(sum) %>% # get the sums for score by working hours and criterion
  group_by(WorkingHours) %>%
  mutate(tot = sum(score)) %>%
  mutate(freq =round(score/tot *100, digits = 2)) # get percentage
  

Creating the plot.

# Plotting
ggplot(data.plot, aes(x = WorkingHours, y = freq,  fill = Criterion))  
  geom_col(position = "dodge")  
  geom_text(aes(label = freq), 
            position = position_dodge(width = 0.9), 
            vjust = 1)  
  xlab("Working Hours")  
  ylab("Percentage")  

Please let me know if there is a more concise or easier way!!

B

DataSource: https://www.europeansocialsurvey.org/downloadwizard/?fbclid=IwAR2aVr3kuqOoy4mqa978yEM1sPEzOaghzCrLCHcsc5gmYkdAyYvGPJMdRp4

CodePudding user response:

Taking this example dataframe df:

df <- structure(list(Working.hours = c("37-42", "37-42", "<27", "<27", 
"43 ", "43 ", "33-36", "33-36", "37-42", "37-42", "<27", "<27"
), country = c("DK", "DK", "SE", "SE", "DK", "DK", "SE", "SE", 
"NO", "NO", "NO", "NO"), criterion = c("happy", "lifesatisfied", 
"happy", "lifesatisfied", "happy", "lifesatisfied", "happy", 
"lifesatisfied", "happy", "lifesatisfied", "happy", "lifesatisfied"
), score = c(7L, 9L, 8L, 8L, 7L, 8L, 6L, 6L, 7L, 5L, 4L, 7L)), row.names = c(NA, 
-12L), class = c("tbl_df", "tbl", "data.frame"))

you can proceed like this:

library(dplyr)
library(ggplot2)

df <- 
    df %>%
    pivot_longer(cols = c(happy, lifesatisfied),
                 names_to = 'criterion',
                 values_to = 'score'
                 )

df %>%
    ggplot(aes(x = Working.hours,
            y = score,
            fill = criterion))  
        geom_col(position = 'dodge')  
        coord_flip()

For picking colours see ?scale_fill_manual, for formatting legend etc. numerous existing answers to related questions on stackoverflow.

  • Related