I have the following dataset:
my_data = structure(list(id = c("A", "A", "A", "A", "A", "B", "B", "B",
"B"), result = c("AAA", "BBB", "CCC", "AAA", "CCC", "DDD", "BBB",
"BBB", "DDD")), class = "data.frame", row.names = c(NA, -9L))
For each unique ID, am interested in learning about how to write a general function that updates a "counter variable" under certain conditions.
For example - for each unique ID:
- if result = AAA then counter restarts to 0
- if result = BBB then counter = counter 1
- if result = CCC then counter = counter 2
- if result = DDD then counter = counter - 1.
Here is my attempt to do this:
library(dplyr)
my_data %>%
group_by(id) %>%
mutate(counter = ifelse(result == "AAA", 0,
ifelse(result == "BBB", 1,
ifelse(result == "CCC", 2,
ifelse(result == "DDD", -1,
ifelse(result == 1, cumsum(result), 0))))))
However, I don't think the results are correct - when I look at the results:
# A tibble: 9 x 3
# Groups: id [2]
id result counter
<chr> <chr> <dbl>
1 A AAA 0
2 A BBB 1
3 A CCC 2
4 A AAA 0
5 A CCC 2
6 B DDD -1
7 B BBB 1
8 B BBB 1
9 B DDD -1
In the third row, the value of the counter should be 3 : 0 (result from row1) 1 (result from row2) 2 (result from row3).
I am trying different ways to debug this function to match my requirements, but so far nothing is working.
Can someone please show me how to correct this?
Thanks!
Note: I think the expected output should look something like this:
id result counter
1 A AAA 0
2 A BBB 1
3 A CCC 3
4 A AAA 0
5 A CCC 2
6 B DDD -1
7 B BBB 0
8 B BBB 1
9 B DDD 0
CodePudding user response:
It may be easier to do a join with a key/value dataset and then do the cumsum
and to making sure that it resets at 0 we do additional grouping on the 0 vs nonzero values
library(data.table)
library(dplyr)
keydat <- tibble(result = c('AAA', 'BBB', 'CCC', 'DDD'), val = c(0, 1, 2, -1))
my_data %>%
left_join(keydat) %>%
group_by(id, grp = rleid(val == 0)) %>%
mutate(counter = cumsum(val) *(val != 0)) %>%
ungroup %>%
select(-grp, -val)
-output
# A tibble: 9 × 3
id result counter
<chr> <chr> <dbl>
1 A AAA 0
2 A BBB 1
3 A CCC 3
4 A AAA 0
5 A CCC 2
6 B DDD -1
7 B BBB 0
8 B BBB 1
9 B DDD 0
Or a similar approach in data.table
library(data.table)
setDT(my_data)[as.data.table(keydat),
counter := val, on = .(result)][,
counter := cumsum(counter) *(counter != 0),
.(id, grp = rleid(counter == 0))][]
-output
id result counter
1: A AAA 0
2: A BBB 1
3: A CCC 3
4: A AAA 0
5: A CCC 2
6: B DDD -1
7: B BBB 0
8: B BBB 1
9: B DDD 0
CodePudding user response:
You can use rle
:
library(dplyr)
my_data <- structure(list(id = c("A", "A", "A", "A", "A", "B", "B", "B",
"B"), result = c("AAA", "BBB", "CCC", "AAA", "CCC", "DDD", "BBB",
"BBB", "DDD")), class = "data.frame", row.names = c(NA, -9L))
my_data %>%
group_by(id, rle = with(rle(result=="AAA"), rep(seq(values),lengths))) %>%
mutate(cnt = cumsum(case_when(
result == "AAA" ~ 0,
result == "BBB" ~ 1,
result == "CCC" ~ 2,
result == "DDD" ~ -1)))
#> # A tibble: 9 × 4
#> # Groups: id, rle [5]
#> id result rle cnt
#> <chr> <chr> <int> <dbl>
#> 1 A AAA 1 0
#> 2 A BBB 2 1
#> 3 A CCC 2 3
#> 4 A AAA 3 0
#> 5 A CCC 4 2
#> 6 B DDD 4 -1
#> 7 B BBB 4 0
#> 8 B BBB 4 1
#> 9 B DDD 4 0
add %>% select(-rle)
to remove rle aux column`
CodePudding user response:
1) Group by id and consecutive rows starting with AAA and then take the indicated cumulative sum within each such group.
library(dplyr)
counter <- function(x) cumsum( (x == "BBB") 2 * (x == "CCC") - (x == "DDD") )
my_data %>%
group_by(id, g = cumsum(result == "AAA")) %>%
mutate(counter = counter(result)) %>%
ungroup %>%
select(-g)
giving:
# A tibble: 9 x 3
id result counter
<chr> <chr> <dbl>
1 A AAA 0
2 A BBB 1
3 A CCC 3
4 A AAA 0
5 A CCC 2
6 B DDD -1
7 B BBB 0
8 B BBB 1
9 B DDD 0
2) or with base R using counter
from above:
transform(my_data, counter =
unlist(tapply(result, paste(id, cumsum(result == "AAA")), counter)))