Home > Software design >  Creating subset of dataset based on multiple condition in r
Creating subset of dataset based on multiple condition in r

Time:02-22

I want to extract the past 3 weeks' data for each household_id, channel combination. These past 3 weeks will be calculated from mala_fide_week and mala_fide_year and it will be less than that for each household_id and channel combination.

Below is the dataset:

enter image description here

for e.g. Household_id 100 for channel A: the mala_fide_week is 42 and mala_fide_year 2021. So past three records will be less than week 42 of the year 2021. This will be calculated from the week and year columns.

For the Household_id 100 and channel B combination, there are only two records much less than mala_fide_week and mala_fide_year.

For Household_id 101 and channel C, there are two years involved in 2019 and 2020.

The final dataset will be as below

enter image description here

Household_id 102 is not considered as week and year is greater than mala_fide_week and mala_fide_year.

I am trying multiple options but not getting through. Any help is much appreciated!

sample dataset:

 data <- data.frame(Household_id = 
 c(100,100,100,100,100,100,101,101,101,101,102,102),
               channel = c("A","A","A","A","B","B","C","C","c","C","D","D"),
               duration = c(12,34,567,67,34,67,98,23,56,89,73,76),
               mala_fide_week = c(42,42,42,42,42,42,5,5,5,5,30,30),
               mala_fide_year =c(2021,2021,2021,2021,2021,2021,2020,2020,2020,2020,2021,2021),
               week =c(36,37,38,39,22,23,51,52,1,2,38,39),
               year = c(2021,2021,2021,2021,2020,2020,2019,2019,2020,2020,2021,2021))

CodePudding user response:

I think you first need to obtain the absolute number of weeks week year * 52, then filter accordingly. slice_tail gets the last three rows of each group.

library(dplyr)

data |>
  filter(week   52*year <= mala_fide_week   52 *mala_fide_year) |>
  group_by(Household_id, channel)  |>
  arrange(year, week, .by_group = TRUE) |>
  slice_tail(n = 3)

# A tibble: 8 x 7
# Groups:   Household_id, channel [3]
  Household_id channel duration mala_fide_week mala_fide_year  week  year
         <dbl> <chr>      <dbl>          <dbl>          <dbl> <dbl> <dbl>
1          100 A             34             42           2021    37  2021
2          100 A            567             42           2021    38  2021
3          100 A             67             42           2021    39  2021
4          100 B             34             42           2021    22  2020
5          100 B             67             42           2021    23  2020
6          101 C             23              5           2020    52  2019
7          101 C             56              5           2020     1  2020
8          101 C             89              5           2020     2  2020
  •  Tags:  
  • r
  • Related