Home > Net >  Idenitfying rows within a group which meets a condtion
Idenitfying rows within a group which meets a condtion

Time:10-02

I have a table with the following

ID Days Condition
1 1 F
1 2 F
1 4 T
1 10 F
1 11 F
1 12 T
2 1 F
2 2 F
2 14 T
2 20 F
2 24 T
2 30 F

How do Identify all ids and days which fall between condition ==True.

For eg

ID 1 has day 4 condition =T

ID 1 has day 12 condition =T

similarly

ID 2 has day 14 condiiton =T

ID 2 has day 24 condiiton =T

Need all the days and Ids between these the code will return ;

ID Days Condition
1 4 T
1 10 F
1 11 F
1 12 T
2 14 T
2 20 F
2 24 T

Iam looking for a dplyr solution but base R is also fine . Completely lost on how to do this ..

Thanks

CodePudding user response:

You can create a mask and use a filtering join:

library(tidyverse)

data <- tibble::tribble(
  ~ID, ~Days, ~Condition,
   1L,    1L,      FALSE,
   1L,    2L,      FALSE,
   1L,    4L,       TRUE,
   1L,   10L,      FALSE,
   1L,   11L,      FALSE,
   1L,   12L,       TRUE,
   2L,    1L,      FALSE,
   2L,    2L,      FALSE,
   2L,   14L,       TRUE,
   2L,   20L,      FALSE,
   2L,   24L,       TRUE,
   2L,   30L,      FALSE
  )


selcted_id_days <-
  data %>%
  filter(Condition) %>%
  group_by(ID) %>%
  summarise(
    min = min(Days),
    max = max(Days)
  ) %>%
  mutate(range = min %>% map2(max, ~ seq(.x, .y))) %>%
  unnest(range) %>%
  distinct(ID, Days = range)

selcted_id_days
#> # A tibble: 20 x 2
#>       ID  Days
#>    <int> <int>
#>  1     1     4
#>  2     1     5
#>  3     1     6
#>  4     1     7
#>  5     1     8
#>  6     1     9
#>  7     1    10
#>  8     1    11
#>  9     1    12
#> 10     2    14
#> 11     2    15
#> 12     2    16
#> 13     2    17
#> 14     2    18
#> 15     2    19
#> 16     2    20
#> 17     2    21
#> 18     2    22
#> 19     2    23
#> 20     2    24

data %>% semi_join(selcted_id_days)
#> Joining, by = c("ID", "Days")
#> # A tibble: 7 x 3
#>      ID  Days Condition
#>   <int> <int> <lgl>    
#> 1     1     4 TRUE     
#> 2     1    10 FALSE    
#> 3     1    11 FALSE    
#> 4     1    12 TRUE     
#> 5     2    14 TRUE     
#> 6     2    20 FALSE    
#> 7     2    24 TRUE

Created on 2021-10-01 by the reprex package (v2.0.1)

CodePudding user response:

Here's how I would do it. It should also work where there might be 2 ranges etc.

df %>%
  group_by(ID) %>%
  mutate(
    n = cumsum(Condition)
  ) %>%
  filter(
    n %% 2 == 1 | Condition
  ) %>%
  select(-n)

edit: fixed typo

  • Related