Home > Enterprise >  ggplot : ploting the evolution of a value within a time serie (MM:DD) through the years
ggplot : ploting the evolution of a value within a time serie (MM:DD) through the years

Time:01-26

I'm trying to visualize the evolution of snow height throughout the years within the ski season.

I've been able to create a plot for each independant season, but I'm unable to stack them on top of each other.

My data frame (DAll) looks like this:

| Date       | SnowHeight |Season | Date_without_year |
| ---------- | ---------- |------ | ----------------- |
| 2017-12-01 | 33         | 2018  | 12-01             |
| 2017-12-02 | 37.5       | 2018  | 12-02             |
| 2017-12-03 | 39.5       | 2018  | 12-03             |
...
| 2018-04-28 | 144        | 2018  | 04-28             |
| 2018-04-29 | 132        | 2018  | 04-29             |
| 2018-04-30 | 120        | 2018  | 04-30             |
.......
| 2018-12-01 | 21         | 2019  | 12-01             |
| 2018-12-02 | 21.2       | 2019  | 12-02             |
| 2018-12-03 | 24         | 2019  | 12-03             |
...
| 2019-04-28 | 99         | 2019  | 04-28             |
| 2019-04-29 | 86         | 2019  | 04-29             |
| 2019-04-30 | 85         | 2019  | 04-30             |

I have data for 12-01 (December 1st) to 04-30 (April 30th) for each yea, and would like to keep my X axis with the same time range.

So one of the difficulties I'm having is that because I want to graph for a ski season (december to april) the data is in two different year groups...

I've tried a few things, but the one that looked like it had the most chance is

plot <- ggplot(DAll, aes(x = Date_without_year, y = SnowHeight, color = Season))   
        geom_area(position = "stack")   
        labs(x = "Date", y = "Snow Height", title = "Snow Height Evolution (Dec-Apr) by Season")

but I get an empty graph (and checked that the value is numeric and should work with the date range...

Maybe I'm misunderstanding the tools or just simply making an error at the beginning...any help will be appreciated :)

CodePudding user response:

The reason why you get an empty chart is that you missed to set the group aesthetic which as more or less general rule is necessary for a line or area chart in cases where the x variable is a discrete or categorical variable as your Date_without_year.

However, as you want to create a chart showing and comparing snow height over ski seasons I would suggest to stick with proper dates to get the right order. But instead of using the true dates fix the years, e.g. use the year 2017 for December and 2018 for the other months. Afterwards to could format the dates via scale_x_date.

Finally, in my opinion a stacked area chart does not make that much sense to compare or visualize snow height over ski seasons. Instead I would suggest to use a line chart.

library(dplyr)
library(ggplot2)

DAll <- DAll |>
  mutate(
    Date_with_year = if_else(
      grepl("^12", Date_without_year),
      paste0("2017-", Date_without_year),
      paste0("2018-", Date_without_year)
    ),
    Date_with_year = as.Date(Date_with_year)
  )

base <- ggplot(DAll, aes(
  x = Date_with_year, y = SnowHeight,
  group = Season
))  
  scale_x_date(date_labels = "%m-%d")  
  labs(
    x = "Date", y = "Snow Height",
    title = "Snow Height Evolution (Dec-Apr) by Season"
  )

base  
  geom_area(aes(fill = factor(Season)))


base  
  geom_line(aes(color = factor(Season)))

DATA

DAll <- structure(list(Date = c(
  "2017-12-01", "2017-12-02", "2017-12-03",
  "2018-04-28", "2018-04-29", "2018-04-30", "2018-12-01", "2018-12-02",
  "2018-12-03", "2019-04-28", "2019-04-29", "2019-04-30"
), SnowHeight = c(
  33,
  37.5, 39.5, 144, 132, 120, 21, 21.2, 24, 99, 86, 85
), Season = c(
  2018L,
  2018L, 2018L, 2018L, 2018L, 2018L, 2019L, 2019L, 2019L, 2019L,
  2019L, 2019L
), Date_without_year = c(
  "12-01", "12-02", "12-03",
  "04-28", "04-29", "04-30", "12-01", "12-02", "12-03", "04-28",
  "04-29", "04-30"
)), class = "data.frame", row.names = c(NA, -12L))
  • Related