Home > Enterprise >  mistake when using dplyr, trying to plot a variable in proportion of the total
mistake when using dplyr, trying to plot a variable in proportion of the total

Time:08-04

I have a dataset which has the following structure < dput(head(df)) > :

 structure(list(type_de_sejour = c("Amb", "Hosp", 
 "Hosp", "Amb", "Hosp", "Sea"), 
 specialite = c("ANES", "ANES", 
 "Autres", "CARD", "CARD", "CARD"
 ), CA_annee_N = c(2712L, 122180L, 0L, 822615L, 6905494L, 
 0L), nb_sejours_N = c(8L, 32L, 0L, 1052L, 2776L, 0L), nb_doc_N = c(5L, 
 8L, 0L, 12L, 15L, 0L), CA_annee_N1 = c(4231L, 78858L, 6587L, 
 327441L, 6413083L, 0L), nb_sejours_N1 = c(13L, 29L, 2L, 532L, 
 2819L, 0L), nb_doc_N1 = c(6L, 9L, 1L, 12L, 12L, 0L
 ), CA_annee_N2 = c(4551L, 27432L, 0L, 208326L, 7465440L, 
 575L), nb_sejours_N2 = c(15L, 8L, 0L, 463L, 3393L, 1L), nb_doc_N2 = c(6L, 
 4L, 0L, 11L, 13L, 1L), site = c("FR", "FR", "FR", "FR", 
 "FR", "FR")), row.names = c(NA, 6L), class = "data.frame")

I am trying to plot a graph showing the percentage each "specialite" (distinguishing per "site", ideally by faceting or doing 2 plots, one per site) represents in the total "nb_sejours_N", after having filtered by type_de_sejour == "Amb".

I have tried the following code :

df %>%
    mutate(volume_N == nb_sejours_N,
           volume_N1 == nb_sejours_N1,
           volume_N2 == nb_sejours_N2)%>%
   filter(type_de_sejour == "Amb")%>%
   group_by(site) %>%
   mutate(proportion_N = volume_N/sum(volume_N, na.rm = TRUE),
          proportion_N1 = volume_N1/sum(volume_N1, na.rm = TRUE),
          proportion_N2 = volume_N2/sum(volume_N2, na.rm = TRUE))

Unfortunately, it doesn't work, so I can't go any further. I would also like to know if anyone knows an efficient code to plot what I'm trying to represent ?

CodePudding user response:

I believe the following works:


# creating plot
p = df %>% filter(type_de_sejour == "Amb") %>% 
  pivot_longer(cols = c("nb_sejours_N","nb_sejours_N1","nb_sejours_N2"), values_to = "visit") %>% 
  ggplot(aes(fill=name, y=visit, x=name))   geom_bar(position="stack", stat="identity")


# creating summary of totals for each column
totals = df %>% filter(type_de_sejour == "Amb") %>% 
  pivot_longer(cols = c("nb_sejours_N","nb_sejours_N1","nb_sejours_N2"), values_to = "visit") %>% 
  group_by(name) %>% summarise(total = sum(visit))


# adding totals on top of bars to plot
p   geom_text(aes(name, total, label = total, fill = NULL), data = totals)

  • Related