Home > Software engineering >  Filter individual date ranges between events
Filter individual date ranges between events

Time:05-18

I am currently working on a dataset which contains participants who are followed for a year and perform a physical test on different days about once a month. These dates could vary for each individual. In between they fill out psychological questions twice a day and I would like to analyse the data in between the physical tests. Therefore, I would like to filter all rows between the first and the last test day for every individual, and preserve all psychological data in between.

My current approach is based upon this answer: How to filter rows in dataframe that are not within a certain timeframe of an event value in another row?

library(dplyr)

# Simplified dataset

set.seed(1)
day_count <- c(1:8,12:20,14:26)
date <- as.Date(c(1:8,12:20,14:26), origin = Sys.Date())
id <- c(rep("A",9),rep("B",9),rep("C",12))
mood <- c(sample(1:100, 9),sample(1:100, 9),sample(1:100, 12))
ISRT <- c(c(NA,100,NA,NA,NA,NA,NA,90,NA),
        c(NA,NA,70,NA,NA,NA,80,NA,NA),
        c(90,NA,NA,100,NA,NA,50,NA,NA,NA,10,NA))

dat <- data_frame(day_count,date, id, mood, ISRT)

dat <-  dat %>% mutate(test_day = !is.na(ISRT))

dat_between_tests <- dat %>%
  mutate(date = as.Date(date, format="%Y-%m-%d")) %>%   
  group_by(id) %>%
      filter(Reduce(`|`, purrr::map(date[test_day == TRUE],
                                ~dplyr::between(date, .x -1  , .x   1))))

I have included one day before and after the test day because otherwise this approach does not work (which ideally I would like to). In this simplified example, this approach seems to work. But when I run this on my own dataset I receive the following error:

Error:
! Problem with `filter()` input `..1`.
ℹ Input `..1` is `Reduce(...)`.
✖ Input `..1` must be of size 172 or 1, not size 0.
ℹ The error occurred in group 4: id = "1cf91d6c2f7ddfbd68b93dbc04a4c667".

Does anyone know what causes this and how I can resolve this error? Could it have something to do with the occurrence of multiple test days throughout the study period?

CodePudding user response:

To exclude that particular id from your dataset you could try:

dat_between_tests <- dat %>%
  mutate(date = as.Date(date, format="%Y-%m-%d")) %>%   
  filter(id != "1cf91d6c2f7ddfbd68b93dbc04a4c667") %>% # this should exclude the id with no test days
  group_by(id) %>%
  filter(Reduce(`|`, 
                purrr::map(date[test_day == TRUE], 
                           ~dplyr::between(date, .x -1, .x   1))))
  • Related