I want to extract the past 3 weeks' data for each household_id
, channel
combination. These past 3 weeks will be calculated from mala_fide_week
and mala_fide_year
and it will be less than that for each household_id
and channel
combination.
Below is the dataset:
for e.g. Household_id
100 for channel A: the mala_fide_week
is 42 and mala_fide_year
2021. So past three records will be less than week 42 of the year 2021. This will be calculated from the week
and year
columns.
For the Household_id
100 and channel
B combination, there are only two records much less than mala_fide_week
and mala_fide_year
.
For Household_id
101 and channel
C, there are two years involved in 2019 and 2020.
The final dataset will be as below
Household_id
102 is not considered as week and year is greater than mala_fide_week
and mala_fide_year
.
I am trying multiple options but not getting through. Any help is much appreciated!
sample dataset:
data <- data.frame(Household_id =
c(100,100,100,100,100,100,101,101,101,101,102,102),
channel = c("A","A","A","A","B","B","C","C","c","C","D","D"),
duration = c(12,34,567,67,34,67,98,23,56,89,73,76),
mala_fide_week = c(42,42,42,42,42,42,5,5,5,5,30,30),
mala_fide_year =c(2021,2021,2021,2021,2021,2021,2020,2020,2020,2020,2021,2021),
week =c(36,37,38,39,22,23,51,52,1,2,38,39),
year = c(2021,2021,2021,2021,2020,2020,2019,2019,2020,2020,2021,2021))
CodePudding user response:
I think you first need to obtain the absolute number of weeks week year * 52
, then filter accordingly. slice_tail
gets the last three rows of each group.
library(dplyr)
data |>
filter(week 52*year <= mala_fide_week 52 *mala_fide_year) |>
group_by(Household_id, channel) |>
arrange(year, week, .by_group = TRUE) |>
slice_tail(n = 3)
# A tibble: 8 x 7
# Groups: Household_id, channel [3]
Household_id channel duration mala_fide_week mala_fide_year week year
<dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 100 A 34 42 2021 37 2021
2 100 A 567 42 2021 38 2021
3 100 A 67 42 2021 39 2021
4 100 B 34 42 2021 22 2020
5 100 B 67 42 2021 23 2020
6 101 C 23 5 2020 52 2019
7 101 C 56 5 2020 1 2020
8 101 C 89 5 2020 2 2020