I am trying to find specific values, in this case the first ones, multiple times in a time series. The data looks like this
data <- data.table::data.table(value = c(0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 1, 1),
time = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21))
Now I want to find the first values where value == 1 and give me the time to this. The result should look like this for time: 4, 14, 18
It would also be a great help if the solution would be flexible so you can skip small numbers of following zeros. For this case the result would look like this for time: 4, 18 because you can skip the 2 zeros in the middle.
I already tried solution with which(min) but it only gives me the first value and not the following first values.
CodePudding user response:
Using data.table
rleid
-
data$time[!duplicated(data.table::rleid(data$value)) & data$value == 1]
#[1] 4 14 18
If you want to skip some count of consecutive zero you may use this function.
skip_zero <- function(df, n = 0) {
inds <- data.table::rleid(df$value)
df$value[ave(inds, inds, FUN = length) <= n & df$value == 0] <- 1
inds <- data.table::rleid(df$value)
df$time[!duplicated(inds) & df$value == 1]
}
skip_zero(data)
#[1] 4 14 18
skip_zero(data, 2)
#[1] 4 14
CodePudding user response:
You need to create a group for each consecutive run of 0&1s. Then you can group by this and take the first row in each.
data %>%
mutate(group = cumsum(value != lag(value , 1, TRUE))) %>%
group_by(group) %>%
filter(row_number()==1, value == 1) %>%
select(-group)
CodePudding user response:
with(
rle(data$value),
c(1, 1 cumsum(lengths))[which(values == 1)]
)
# [1] 4 14 18