Home > OS >  use of pivot_wider to plot the evolution of variables in R
use of pivot_wider to plot the evolution of variables in R

Time:08-08

I would like to plot the evolution of the number of workers per category ("A", "D", "F", "I"), from 2017 to 2021, with a stacked bar chart (with the labels in the middle of each bar, for each category), one bar per year. Yet my dataset isn't in the right way to do this, I think I need to use pivot_wider() or pivot_longer() from what I have seen here, but I don't really know how to manipulate these functions. Could anyone help ?

Here is the structure of my dataset, for reproducibility :

 structure(list(A = c("10", "7", "8", "8", "9", "Total"), D = c(23, 
 14, 29, 35, 16, 117), F = c(8, 7, 11, 6, 6, 38), I = c(449, 498, 
 415, 470, 531, 2363), annee = c("2017", "2018", "2019", "2020", 
 "2021", NA)), core = structure(list(A = c("10", "7", "8", "8", 
 "9"), D = c(23, 14, 29, 35, 16), F = c(8, 7, 11, 6, 6), I = c(449, 
 498, 415, 470, 531)), class = "data.frame", row.names = c(NA, 
 -5L)), tabyl_type = "two_way", totals = "row", row.names = c(NA, 
 6L), class = c("tabyl", "data.frame"))

CodePudding user response:

library(tidyverse)
library(ggrepel)

df <- structure(list(A = c("10", "7", "8", "8", "9", "Total"), D = c(
  23,
  14, 29, 35, 16, 117
), F = c(8, 7, 11, 6, 6, 38), I = c(
  449, 498,
  415, 470, 531, 2363
), annee = c(
  "2017", "2018", "2019", "2020",
  "2021", NA
)), core = structure(list(A = c(
  "10", "7", "8", "8",
  "9"
), D = c(23, 14, 29, 35, 16), F = c(8, 7, 11, 6, 6), I = c(
  449,
  498, 415, 470, 531
)), class = "data.frame", row.names = c(
  NA,
  -5L
)), tabyl_type = "two_way", totals = "row", row.names = c(
  NA,
  6L
), class = c("tabyl", "data.frame"))   

df |> 
  filter(!is.na(annee)) |> 
  mutate(A = as.double(A)) |> 
  pivot_longer(-annee, names_to = "category") |> 
  ggplot(aes(annee, value, fill = category, label = value))  
  geom_col()  
  geom_label_repel(position = position_stack(), max.overlaps = 20)

Created on 2022-08-08 by the reprex package (v2.0.1)

CodePudding user response:

Once you remove the total row, and ensuring that A through I are numeric, you can pivot_longer and pass to ggplot() like this:

data %>% 
  filter(A!="Total") %>% 
  mutate(across(A:I, as.numeric)) %>% 
  pivot_longer(cols = -annee, names_to = "group", values_to = "ct") %>% 
  ggplot(aes(annee,ct,fill=group))   
  geom_col()

I did not add the category labels, since group I dominates each year; you might want to reconsider that visualization

  • Related