Home > Software engineering >  how to filter to one year ago from the most recent year R
how to filter to one year ago from the most recent year R

Time:10-08

I have a data frame as follows:

B = data.frame(
  week = c("2021-08-09", "2021-11-11", "2017-06-18", "2020-09-07", "2020-09-07", "2020-09-07", "2020-09-12",
           "2020-08-23", "2019-12-22", "2017-10-29"),
  store = c(14071, 11468, 2428, 17777, 14821, 10935,  5127, 14772, 14772, 14772),
  fill = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1)
)

I am trying to filter by dates that are within the most recent two years so in this case 2021 and 2020. However I am not trying to hardcode explicitly 2021 and 2020 as when this data gets refreshed it will go into years 2022 and so on. For example if the two most recent years are 2023 and 2022 then I would like to keep all "week" values of dates that contain those years. Essentially I am aiming to get data that is a year ago from the current year (aka 2 years of data). The output I am trying to get based on my above example is:

B = data.frame(
  week = c("2021-08-09", "2021-11-11", "2020-09-07", "2020-09-07", "2020-09-07", "2020-09-12", "2020-08-23"),
  store = c(14071, 11468, 17777, 14821, 10935,  5127, 14772),
  fill = c(1, 1, 1, 1, 1, 1, 1)
)

Any idea on how I can filter this way? Thanks!

CodePudding user response:

First, the date-looking things should be real Date-class objects, so I'll start with

B$week <- as.Date(B$week)

(Not strictly required, but likely a good thing anyway.)

From here,

oneyearago <- with(list(r = as.POSIXlt(max(B$week))), { r$year <- r$year - 1; r; })
twoyears <- as.Date(format(oneyearago, format = "%Y-01-01"))
B[B$week >= twoyears,]
#         week store fill
# 1 2021-08-09 14071    1
# 2 2021-11-11 11468    1
# 4 2020-09-07 17777    1
# 5 2020-09-07 14821    1
# 6 2020-09-07 10935    1
# 7 2020-09-12  5127    1
# 8 2020-08-23 14772    1

CodePudding user response:

Here is a base R way.
First, coerce week to class "Date".

B$week <- as.Date(B$week)

Now, get the years vector with format, then sort and keep the last two with tail and finally, filter the data.frame.

yr <- as.integer(format(B$week, "%Y"))
last_2 <- tail(sort(unique(yr)), n = 2)
B[yr >= min(last_2), ]
#        week store fill
#1 2021-08-09 14071    1
#2 2021-11-11 11468    1
#3 2020-09-07 17777    1
#4 2020-09-07 14821    1
#5 2020-09-07 10935    1
#6 2020-09-12  5127    1
#7 2020-08-23 14772    1

CodePudding user response:

We can use tidyverse

library(dplyr)
library(lubridate)
B %>%
    mutate(year = year(ymd(week))) %>%
    arrange(desc(year)) %>% 
    filter(year >= min(head(unique(year), 2))) %>%
    select(-year)

-output

        week store fill
1 2021-08-09 14071    1
2 2021-11-11 11468    1
3 2020-09-07 17777    1
4 2020-09-07 14821    1
5 2020-09-07 10935    1
6 2020-09-12  5127    1
7 2020-08-23 14772    1
  • Related