Home > Back-end >  Automate mutate() and case_when() for each value in dplyr
Automate mutate() and case_when() for each value in dplyr

Time:05-16

I have a dataframe named "data" which contains the column "date" which gives the date of the event.

I want to add a column that describes the "winter season" under consideration (in fact the period between August of one year and August of the other year).

For that I make a mutate which takes the date and with a case_when returns the string corresponding to the considered winter.

For example, if for a given row there is the date 12 December 2017, the code block will fill the new column "winter season" with the string "winter 2017-2018".

This works very well, but my code is heavy and this forced me to do it by hand, situation by situation. I feel that this can be simplified in a more elegant way and adapted to any year. That between the first of August of year X and the first of August of year Y, it returns the value "winter x-y"?

How to write this block of code more efficiently?

data <- data %>%
  mutate(winter = case_when(
    between(date, as.Date("2013-08-01"), as.Date("2014-08-01")) == TRUE ~ "winter 2013-2014",
    between(date, as.Date("2014-08-01"), as.Date("2015-08-01")) == TRUE ~ "winter 2014-2015",
    between(date, as.Date("2015-08-01"), as.Date("2016-08-01")) == TRUE ~ "winter 2015-2016",
    between(date, as.Date("2016-08-01"), as.Date("2017-08-01")) == TRUE ~ "winter 2016-2017",
    between(date, as.Date("2017-08-01"), as.Date("2018-08-01")) == TRUE ~ "winter 2017-2018",
    between(date, as.Date("2018-08-01"), as.Date("2019-08-01")) == TRUE ~ "winter 2018-2019",
    between(date, as.Date("2019-08-01"), as.Date("2020-08-01")) == TRUE ~ "winter 2019-2020",
    between(date, as.Date("2020-08-01"), as.Date("2021-08-01")) == TRUE ~ "winter 2020-2021",
    between(date, as.Date("2021-08-01"), as.Date("2022-08-01")) == TRUE ~ "winter 2021-2022",
    between(date, as.Date("2022-08-01"), as.Date("2023-08-01")) == TRUE ~ "winter 2022-2023",
    TRUE ~ "autre"
  ))

CodePudding user response:

A minimal reproducible example would make it a lot easier to understand the problem, but perhaps this will help:

library(tidyverse)
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union

data <- data.frame(name = 1:100,
                   date = seq.Date(from = as.Date("2013-08-01"),
                                   to = as.Date("2022-08-01"),
                                   length.out = 100))

data %>%
  mutate(season = case_when(quarter(date, fiscal_start = 0) == 1 ~ "Summer",
                            quarter(date, fiscal_start = 0) == 2 ~ "Autumn",
                            quarter(date, fiscal_start = 0) == 3 ~ "Winter",
                            quarter(date, fiscal_start = 0) == 4 ~ "Spring"),
         year = paste(year(date) - 1, year(date), sep = "-")) %>%
  unite("season_and_year", c(season, year), sep = " ", remove = TRUE) %>%
  head()
#>   name       date  season_and_year
#> 1    1 2013-08-01 Winter 2012-2013
#> 2    2 2013-09-03 Spring 2012-2013
#> 3    3 2013-10-06 Spring 2012-2013
#> 4    4 2013-11-08 Spring 2012-2013
#> 5    5 2013-12-11 Summer 2012-2013
#> 6    6 2014-01-14 Summer 2013-2014

Created on 2022-05-16 by the reprex package (v2.0.1)

CodePudding user response:

You can calculate month and year from your date. Then if month < 8 (before August) then the winter is from year-1 to year. Otherwise winter is year to year 1

data <- data %>%
   mutate(
          month=as.numeric(format(date,"%m")),
          year=as.numeric(format(date,"%Y")),
          winter=case_when(month<8 ~ paste0("Winter ",year-1,"-",year),
                           TRUE ~ paste0("Winter ",year,"-",year 1))
          ) %>%
    select(-month,-year)
          
  • Related