I have daily time series data. I want to identify all rows in the data that correspond to the month of January. For these rows, I want to update the year column so that it is shifted back by one year. This will allow the January rows to be accounted for in the previous year's season rather than the current year.
This is a reproducible code that generates what resembles my data:
library(dplyr)
library(tibble)
# Set the seed for reproducibility
set.seed(123)
# Create a sequence of dates from 2001 to 2005
dates <- seq(as.Date("2001-01-01"), as.Date("2005-12-31"), by = "day")
# Create a tibble with the dates and random numbers for var1 to var4
df <- tibble(year = year(dates), month = month(dates), day = day(dates),
var1 = runif(length(dates)), var2 = runif(length(dates)),
var3 = runif(length(dates)), var4 = runif(length(dates)))
df
Any thoughts please?
CodePudding user response:
For a dplyr
use you could probably do a mutate
with case_when
. I added a new variable to demonstrate, just mutate year
if you really want to.
library(dplyr)
library(tibble)
library(lubridate)
# Set the seed for reproducibility
set.seed(123)
# Create a sequence of dates from 2001 to 2005
dates <- seq(as.Date("2001-01-01"), as.Date("2005-12-31"), by = "day")
# Create a tibble with the dates and random numbers for var1 to var4
df <- tibble(year = year(dates), month = month(dates), day = day(dates),
var1 = runif(length(dates)), var2 = runif(length(dates)),
var3 = runif(length(dates)), var4 = runif(length(dates)))
# add a new grouping variable
df$countyear <- df$year
df <- df %>% mutate(countyear = case_when(.$month == 1 ~ year - 1, .$month != 1 ~ year))
> head(df)
# A tibble: 6 x 8
year month day var1 var2 var3 var4 countyear
<dbl> <dbl> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2001 1 1 0.576 0.455 0.517 0.857 2000
2 2001 1 2 0.741 0.934 0.381 0.593 2000
3 2001 1 3 0.0914 0.264 0.717 0.907 2000
4 2001 1 4 0.541 0.818 0.981 0.910 2000
5 2001 1 5 0.603 0.118 0.768 0.586 2000
6 2001 1 6 0.222 0.888 0.614 0.716 2000
CodePudding user response:
you can show year variable as an integer and filtering January (month "01") substract a year from theses dates:
library(dplyr)
library(tibble)
# Set the seed for reproducibility
set.seed(123)
# Create a sequence of dates from 2001 to 2005
dates <- seq(as.Date("2001-01-01"), as.Date("2005-12-31"), by = "day")
# Create a tibble with the dates and random numbers for var1 to var4
df <- tibble(year = as.integer(format(dates, format="%Y")), month = format(dates, format="%m"), day = format(dates, format="%d"),
var1 = runif(length(dates)), var2 = runif(length(dates)),
var3 = runif(length(dates)), var4 = runif(length(dates)))
df$year[df$month == "01"] <- df$year[df$month == "01"] - 1