Home > database >  find the first value multiple times in a time series in R
find the first value multiple times in a time series in R

Time:10-01

I am trying to find specific values, in this case the first ones, multiple times in a time series. The data looks like this

data <- data.table::data.table(value = c(0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 1, 1),
                               time  = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21))

Now I want to find the first values where value == 1 and give me the time to this. The result should look like this for time: 4, 14, 18

It would also be a great help if the solution would be flexible so you can skip small numbers of following zeros. For this case the result would look like this for time: 4, 18 because you can skip the 2 zeros in the middle.

I already tried solution with which(min) but it only gives me the first value and not the following first values.

CodePudding user response:

Using data.table rleid -

data$time[!duplicated(data.table::rleid(data$value)) & data$value == 1]
#[1]  4 14 18

If you want to skip some count of consecutive zero you may use this function.

skip_zero <- function(df, n = 0) {
  inds <- data.table::rleid(df$value)
  df$value[ave(inds, inds, FUN = length)  <= n & df$value == 0] <- 1
  inds <- data.table::rleid(df$value)
  df$time[!duplicated(inds) & df$value == 1]
}

skip_zero(data)
#[1]  4 14 18

skip_zero(data, 2)
#[1]  4 14

CodePudding user response:

You need to create a group for each consecutive run of 0&1s. Then you can group by this and take the first row in each.

data %>%
  mutate(group = cumsum(value  != lag(value , 1, TRUE))) %>%
  group_by(group) %>%
  filter(row_number()==1, value == 1) %>%
  select(-group)

CodePudding user response:

with(
  rle(data$value),
  c(1, 1   cumsum(lengths))[which(values == 1)]
)
# [1]  4 14 18
  • Related