Home > front end >  filter rows by condition and add n rows before/after selected row
filter rows by condition and add n rows before/after selected row

Time:03-25

I have a grouped data and I want to select a rows that fulfill a certain condition (works with code provided below), but I also want to add n number of rows before or after that. The code below give me two rows per group and I now for example want to add one row that is before the first row and one row that is after the second row. I could not find a code or function "like add one row", so is there anyway that I can include this in the following code?

daf1 <- df %>% 
  setDT(df) %>% 
  dplyr::mutate(row_id = row_number()) %>% 
  dplyr::group_by(PATIENT.ID) %>% 
  dplyr::mutate(first_yes = cumsum(Caffeinefactor == "yes"), 
         last_no = (Caffeinefactor == "no") * row_number(), 
         last_no = ifelse(first_yes == 0, last_no, 0)) %>% 
  dplyr:: filter((first_yes == 1 & Caffeine >0) | last_no == max(last_no)) %>%
  dplyr::select(-first_yes, -last_no)
DF = structure(list(PATIENT.ID = c(210625L, 210625L, 210625L, 210625L, 
210625L, 210625L, 210625L, 210625L, 210625L, 210625L, 210625L, 
210625L, 210625L, 210625L, 210625L, 210625L, 210625L, 220909L, 
220909L, 220909L, 220909L, 220909L, 220909L, 220909L, 220909L, 
220909L, 220909L, 221179L, 221179L, 221179L, 221179L, 221179L, 
221179L, 221179L, 221179L, 221179L, 221179L, 221179L, 221179L, 
221179L, 221179L, 301705L, 301705L, 301705L, 301705L, 301705L, 
301705L, 301705L, 301705L, 301705L, 301705L, 301705L, 301705L, 
301705L, 301705L, 301705L, 303926L, 303926L, 303926L, 303926L
), PATIENT.TREATMENT.NUMBER = c(1L, 2L, 4L, 5L, 6L, 7L, 8L, 9L, 
10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 1L, 2L, 3L, 4L, 
5L, 6L, 7L, 8L, 9L, 10L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 
10L, 11L, 12L, 13L, 17L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 
10L, 12L, 13L, 14L, 15L, 16L, 1L, 2L, 3L, 4L), Caffeinefactor = c("no", 
"no", "no", "no", "yes", "yes", "yes", "no", "yes", "yes", "yes", 
"yes", "yes", "no", "no", "yes", "yes", "yes", "no", "yes", 
"yes", "yes", "yes", "yes", "yes", "yes", "no", "no", "no", "no", 
"no", "no", "no", "no", "no", "no", "yes", "yes", "yes", "yes", 
"yes", "no", "no", "no", "no", "no", "no", "yes", "no", "yes", 
"yes", "yes", "yes", "yes", "yes", "yes", "no", "no", "no", "no"
)), row.names = c(NA, -60L), class = c("data.table", "data.frame"
), .internal.selfref = <pointer: 0x7fe7f7002ee0>)

What the output looks like now:

PATIENT.ID Caffeinefactor header
210625L no
210625L yes
220909L no
220909L yes
301705L no
301705L yes

The output should look like this if one row is added before and after:

PATIENT.ID Caffeinefactor header
210625L no
210625L no
210625L yes
210625L yes
220909L no
220909L yes
220909L yes
301705L no
301705L no
301705L yes
301705L yes

CodePudding user response:

If you want to add a row before and after to each group, then we can use group_split and add an empty row to before and after the data (for each group), then fill the data up and down. This happens inside of map_dfr, so that we can bind it all back together in a dataframe.

library(tidyverse)

DF %>%
  dplyr::mutate(row_id = row_number()) %>%
  dplyr::group_by(PATIENT.ID) %>%
  dplyr::mutate(first_yes = cumsum(Caffeinefactor == "yes"),
                last_no = (Caffeinefactor == "no") * row_number(),
                last_no = ifelse(first_yes == 0, last_no, 0)) %>%
  dplyr:: filter((first_yes == 1 & Caffeinefactor > 0) | last_no == max(last_no)) %>%
  dplyr::select(-first_yes, -last_no) %>%
  group_split() %>%
  map_dfr(~ .x %>% add_row(., .before = 1) %>% add_row(.) %>% fill(everything(), .direction = "downup")) %>% 
  select(PATIENT.ID, Caffeinefactor)

Output

   PATIENT.ID Caffeinefactor
        <int> <chr>         
 1     210625 no            
 2     210625 no            
 3     210625 yes           
 4     210625 yes           
 5     220909 yes           
 6     220909 yes           
 7     220909 yes           
 8     220909 yes           
 9     220909 yes           
10     220909 yes           
# … with 18 more rows

*Note: The output is based on the data below (taken from the OP). The output from the OP does not match the given data or the code provided, so am just working off of this data.

Data

DF <- structure(list(PATIENT.ID = c(210625L, 210625L, 210625L, 210625L, 
210625L, 210625L, 210625L, 210625L, 210625L, 210625L, 210625L, 
210625L, 210625L, 210625L, 210625L, 210625L, 210625L, 220909L, 
220909L, 220909L, 220909L, 220909L, 220909L, 220909L, 220909L, 
220909L, 220909L, 221179L, 221179L, 221179L, 221179L, 221179L, 
221179L, 221179L, 221179L, 221179L, 221179L, 221179L, 221179L, 
221179L, 221179L, 301705L, 301705L, 301705L, 301705L, 301705L, 
301705L, 301705L, 301705L, 301705L, 301705L, 301705L, 301705L, 
301705L, 301705L, 301705L, 303926L, 303926L, 303926L, 303926L
), PATIENT.TREATMENT.NUMBER = c(1L, 2L, 4L, 5L, 6L, 7L, 8L, 9L, 
10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 1L, 2L, 3L, 4L, 
5L, 6L, 7L, 8L, 9L, 10L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 
10L, 11L, 12L, 13L, 17L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 
10L, 12L, 13L, 14L, 15L, 16L, 1L, 2L, 3L, 4L), Caffeinefactor = c("no", 
"no", "no", "no", "yes", "yes", "yes", "no", "yes", "yes", "yes", 
"yes", "yes", "no", "no", "yes", "yes", "yes", "yes", "yes", 
"yes", "yes", "yes", "yes", "yes", "yes", "no", "no", "no", "no", 
"no", "no", "no", "no", "no", "no", "yes", "yes", "yes", "yes", 
"yes", "no", "no", "no", "no", "no", "no", "yes", "no", "yes", 
"yes", "yes", "yes", "yes", "yes", "yes", "no", "no", "no", "no"
)), row.names = c(NA, -60L), class = c("data.table", "data.frame"
))
  •  Tags:  
  • r
  • Related