Home > Software design >  How to separate a time series panel by the number of missing observations at the end?
How to separate a time series panel by the number of missing observations at the end?

Time:03-04

Consider a set of time series having the same length. Some have missing data in the end, due to the product being out of stock, or due to delisting.

If the series contains at least four missing observations (in my case it is value = 0 and not NA) at the end, I consider the series as delisted.

In my time series panel, I want to separate the series with delisted id's from the other ones and create two different dataframes based on this separation.

I created a simple reprex to illustrate the problem:

library(tidyverse)
library(lubridate)

data <- tibble(id = as.factor(c(rep("1",24),rep("2",24))),
               date = rep(c(ymd("2013-01-01")  months(0:23)),2),
               value = c(c(rep(1,17),0,0,0,0,2,2,3), c(rep(9,20),0,0,0,0))
              )

I am searching for a pipeable tidyverse solution.

CodePudding user response:

Here is one possibility to find delisted ids

data %>%
  group_by(id) %>%
  mutate(delisted = all(value[(n()- 3):n()] == 0)) %>%
  group_by(delisted) %>%
  group_split()

In the end I use group_split to split the data into two parts: one containing delisted ids and the other one contains the non-delisted ids.

  • Related