I have a data frame that looks as follows:
ID# | Days Since Activity |
---|---|
1 | 1 |
1 | 5 |
1 | 5 |
1 | 10 |
1 | 3 |
2 | 4 |
2 | 2 |
2 | 9 |
And I want to include a "count" column that counts all entries and resets when the ID# changes OR if the "Days Since Activity" is 10 or greater. An example output is as follows:
ID# | Days Since Activity | Count |
---|---|---|
1 | 1 | 1 |
1 | 5 | 2 |
1 | 5 | 3 |
1 | 10 | 1 |
1 | 3 | 2 |
2 | 4 | 1 |
2 | 2 | 2 |
2 | 9 | 3 |
I tried doing the following code:
df <- df %>%
group_by(`ID#`) %>%
mutate(`Count` = seq(n()))
But that only reset the count when the ID# changed. How can I incorporate both criteria into my count? Thank you!!
CodePudding user response:
We can create logical flags every time the ID changes or the days since activity exceed 10 (plus special handling for the first row), accumulate these as a cumulative sum, and then use that as a grouping variable to create the numbering.
I've left the extra columns in the final output for the purposes of illustration.
# example data
df <- data.frame(
id = c(rep(1, 5), rep(2, 3)),
days = c(1, 5, 5, 10, 3, 4, 2, 9)
)
df %>%
mutate(
id_change = replace_na(id != lag(id), TRUE),
days_exceeded = days >= 10,
group = cumsum(id_change | days_exceeded)
) %>%
group_by(group) %>%
mutate(count = row_number())
id days id_change days_exceeded group count
<dbl> <dbl> <lgl> <lgl> <int> <int>
1 1 1 TRUE FALSE 1 1
2 1 5 FALSE FALSE 1 2
3 1 5 FALSE FALSE 1 3
4 1 10 FALSE TRUE 2 1
5 1 3 FALSE FALSE 2 2
6 2 4 TRUE FALSE 3 1
7 2 2 FALSE FALSE 3 2
8 2 9 FALSE FALSE 3 3
CodePudding user response:
df %>%
group_by(id, count = cumsum(days>=10)) %>%
mutate(count = row_number())
# A tibble: 8 x 3
# Groups: id, count [6]
id days count
<dbl> <dbl> <int>
1 1 1 1
2 1 5 2
3 1 5 3
4 1 10 1
5 1 3 2
6 2 4 1
7 2 2 2
8 2 9 3