cumulative sum in R with conditions-CodePudding

The column Signal3 is what I am struggling to code.

Here is the code:

library(tibble)
library(dplyr)
library(lubridate)
set.seed(1234)
df <- tibble(signal1 = c(0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0), 
             signal2 = rbinom(31, 1, 0.7), 
             signal3 = c(0, 0, 1, 2, 0, 0, 0, 0, 0, 1, 2, 3, 4, 0, 1, 0, 0, 0, 0, 0, 0, 1, 2, 3, 4, 0, 0, 0, 0, 1, 2))

Basically, if signal1 == 1, then we start counting how many '1' in rows we have in signal2 (until signal2==0).

CodePudding user response：

You can create groups based on encountering the start or end condition, then on the grouped data return the cumulative sum of signal2 when the cumulative sum of signal1 is greater than 0, else return 0.

library(dplyr)

df %>%
  group_by(grp = cumsum(signal1 == 1 | signal2 == 0)) %>%
  mutate(signal3 = ifelse(cumsum(signal1) > 0, cumsum(signal2), 0)) %>%
  ungroup() %>%
  select(-grp)

# A tibble: 31 × 3
   signal1 signal2 signal3
     <dbl>   <int>   <dbl>
 1       0       1       0
 2       0       1       0
 3       1       1       1
 4       0       1       2
 5       0       0       0
 6       0       1       0
 7       0       1       0
 8       0       1       0
 9       0       1       0
10       1       1       1
# … with 21 more rows
# ℹ Use `print(n = ...)` to see more rows

CodePudding user response：

One way to approach this problem is to use the dplyr package to create a new column that indicates whether the current row corresponds to the beginning of a new "counting period" (i.e., when signal1 is 1). You can then use the lubridate package to create a new column that indicates the duration of each counting period. Finally, you can use dplyr to group the data by the duration of the counting period, and then use the summarize function to count the number of rows within each group.

Here is an example of how you might do this:

library(tibble)
library(dplyr)
library(lubridate)

set.seed(1234)
df <- tibble(signal1 = c(0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0), 
             signal2 = rbinom(31, 1, 0.7), 
             signal3 = c(0, 0, 1, 2, 0, 0, 0, 0, 0, 1, 2, 3, 4, 0, 1, 0, 0, 0, 0, 0, 0, 1, 2, 3, 4, 0, 0, 0, 0, 1, 2))

# Create a new column indicating whether the current row is the start of a new counting period
df <- df %>%
  mutate(start_period = ifelse(signal1 == 1, 1, 0))

# Create a new column indicating the duration of each counting period
df <- df %>%
  group_by(start_period) %>%
  mutate(duration = seconds_to_period(sum(seconds(1))))

# Group the data by the duration of the counting period and count the number of rows within each group
df %>%
  group_by(duration) %>%
  summarize(count = n())

This should give you the counts you are looking for. Note that this solution assumes that signal1 is always 1 at the start of a new counting period, and that signal2 is always 0 at the end of a counting period. If this is not the case, you may need to adjust the code accordingly.