Home > Net >  Move the values of variable up a month in dataset - R
Move the values of variable up a month in dataset - R

Time:11-22

I have a variable x in my dataset (Jan 2018 - June 2021):

    date       | specific_date | x
    2018-01-31 | 2019-10-31    | 0
                   ...
    2019-08-31 | 2019-10-31    | 0
    2019-09-30 | 2019-10-31    | 0
    2019-10-31 | 2019-10-31    | 1
    2019-11-30 | 2019-10-31    | NA
    2019-12-31 | 2019-10-31    | NA
                   ...
    2021-06-30 | 2019-10-31    | NA

I would like to move everything up a month in x, like this (1 goes from date=2019-10-31 to date=2019-09-30, and so on):

date       | specific_date | variable
2018-01-31 | 2019-10-31    | 0
                   ...
2019-08-31 | 2019-10-31    | 0
2019-09-30 | 2019-10-31    | 1
2019-10-31 | 2019-10-31    | NA
2019-11-30 | 2019-10-31    | NA
2019-12-31 | 2019-10-31    | NA
                   ...
2021-06-30 | 2019-10-31    | NA

Any ideas on how to do this?

CodePudding user response:

Assuming the input shown in reproducible form in the Note at the end and assuming you want to move the date column x months earlier convert it to yearmon class, subtract x/12 (since yearmon represents each date as a year plus 0 for Jan, 1/12 for Feb, ..., 11/12 for Dec) or subtract 0 if x is NA and then use as.Date to convert to Date class. frac=1 means convert to a date at the end of the month.

Another possibility is to just use yearmon class and not Date class since that directly represents a year and month without a day and the end-of-month dates shown seem just to be a way of representing a year and month. In that case omit the as.Date and its frac= argument.

library(dplyr, exclude = c("lag", "filter"))
library(zoo)    

DF %>%
  mutate(date = as.Date(as.yearmon(date) - coalesce(x, 0)/12, frac = 1))

giving:

        date specific_date  x
1 2018-01-31    2019-10-31  0
2 2019-08-31    2019-10-31  0
3 2019-09-30    2019-10-31  0
4 2019-09-30    2019-10-31  1
5 2019-11-30    2019-10-31 NA
6 2019-12-31    2019-10-31 NA
7 2021-06-30    2019-10-31 NA

Note

The input data frame in reproducible form:

Lines <- "    date       | specific_date | x
    2018-01-31 | 2019-10-31    | 0
                   ...
    2019-08-31 | 2019-10-31    | 0
    2019-09-30 | 2019-10-31    | 0
    2019-10-31 | 2019-10-31    | 1
    2019-11-30 | 2019-10-31    | NA
    2019-12-31 | 2019-10-31    | NA
                   ...
    2021-06-30 | 2019-10-31    | NA"
DF <- read.table(text = Lines, header = TRUE, sep = "|", strip.white = TRUE,
 comment.char = ".")
  • Related