Home > other >  Creating a Counter Function with dplyr
Creating a Counter Function with dplyr

Time:01-23

I have the following dataset:

my_data = structure(list(id = c("A", "A", "A", "A", "A", "B", "B", "B", 
"B"), result = c("AAA", "BBB", "CCC", "AAA", "CCC", "DDD", "BBB", 
"BBB", "DDD")), class = "data.frame", row.names = c(NA, -9L))

For each unique ID, am interested in learning about how to write a general function that updates a "counter variable" under certain conditions.

For example - for each unique ID:

  • if result = AAA then counter restarts to 0
  • if result = BBB then counter = counter 1
  • if result = CCC then counter = counter 2
  • if result = DDD then counter = counter - 1.

Here is my attempt to do this:

library(dplyr)

my_data %>%
  group_by(id) %>%
  mutate(counter = ifelse(result == "AAA", 0, 
         ifelse(result == "BBB", 1, 
     ifelse(result == "CCC", 2, 
         ifelse(result == "DDD", -1, 
     ifelse(result == 1, cumsum(result), 0))))))

However, I don't think the results are correct - when I look at the results:

# A tibble: 9 x 3
# Groups:   id [2]
  id    result counter
  <chr> <chr>    <dbl>
1 A     AAA          0
2 A     BBB          1
3 A     CCC          2
4 A     AAA          0
5 A     CCC          2
6 B     DDD         -1
7 B     BBB          1
8 B     BBB          1
9 B     DDD         -1

In the third row, the value of the counter should be 3 : 0 (result from row1) 1 (result from row2) 2 (result from row3).

I am trying different ways to debug this function to match my requirements, but so far nothing is working.

Can someone please show me how to correct this?

Thanks!

Note: I think the expected output should look something like this:

  id result counter
1  A    AAA       0
2  A    BBB       1
3  A    CCC       3
4  A    AAA       0
5  A    CCC       2
6  B    DDD      -1
7  B    BBB       0
8  B    BBB       1
9  B    DDD       0

CodePudding user response:

It may be easier to do a join with a key/value dataset and then do the cumsum and to making sure that it resets at 0 we do additional grouping on the 0 vs nonzero values

library(data.table)
library(dplyr)
keydat <- tibble(result = c('AAA', 'BBB', 'CCC', 'DDD'), val = c(0, 1, 2, -1))
my_data %>%
   left_join(keydat) %>% 
   group_by(id, grp = rleid(val == 0)) %>% 
   mutate(counter = cumsum(val) *(val != 0)) %>%
   ungroup %>%
   select(-grp, -val) 

-output

# A tibble: 9 × 3
  id    result counter
  <chr> <chr>    <dbl>
1 A     AAA          0
2 A     BBB          1
3 A     CCC          3
4 A     AAA          0
5 A     CCC          2
6 B     DDD         -1
7 B     BBB          0
8 B     BBB          1
9 B     DDD          0

Or a similar approach in data.table

library(data.table)
setDT(my_data)[as.data.table(keydat), 
  counter := val, on = .(result)][,
    counter := cumsum(counter) *(counter != 0), 
  .(id, grp = rleid(counter == 0))][]

-output

    id result counter
1:  A    AAA       0
2:  A    BBB       1
3:  A    CCC       3
4:  A    AAA       0
5:  A    CCC       2
6:  B    DDD      -1
7:  B    BBB       0
8:  B    BBB       1
9:  B    DDD       0

CodePudding user response:

You can use rle:

library(dplyr)

my_data <- structure(list(id = c("A", "A", "A", "A", "A", "B", "B", "B", 
"B"), result = c("AAA", "BBB", "CCC", "AAA", "CCC", "DDD", "BBB", 
"BBB", "DDD")), class = "data.frame", row.names = c(NA, -9L))


my_data %>% 
  group_by(id, rle = with(rle(result=="AAA"), rep(seq(values),lengths))) %>% 
  mutate(cnt = cumsum(case_when(
    result == "AAA" ~ 0, 
    result == "BBB" ~ 1,
    result == "CCC" ~ 2,
    result == "DDD" ~ -1)))

#> # A tibble: 9 × 4
#> # Groups:   id, rle [5]
#>   id    result   rle   cnt
#>   <chr> <chr>  <int> <dbl>
#> 1 A     AAA        1     0
#> 2 A     BBB        2     1
#> 3 A     CCC        2     3
#> 4 A     AAA        3     0
#> 5 A     CCC        4     2
#> 6 B     DDD        4    -1
#> 7 B     BBB        4     0
#> 8 B     BBB        4     1
#> 9 B     DDD        4     0

add %>% select(-rle) to remove rle aux column`

CodePudding user response:

1) Group by id and consecutive rows starting with AAA and then take the indicated cumulative sum within each such group.

library(dplyr)

counter <- function(x) cumsum( (x == "BBB")   2 * (x == "CCC") - (x == "DDD") )
my_data %>%
  group_by(id, g = cumsum(result == "AAA")) %>%
  mutate(counter = counter(result)) %>%
  ungroup %>%
  select(-g)

giving:

# A tibble: 9 x 3
  id    result counter
  <chr> <chr>    <dbl>
1 A     AAA          0
2 A     BBB          1
3 A     CCC          3
4 A     AAA          0
5 A     CCC          2
6 B     DDD         -1
7 B     BBB          0
8 B     BBB          1
9 B     DDD          0

2) or with base R using counter from above:

transform(my_data, counter = 
  unlist(tapply(result, paste(id, cumsum(result == "AAA")), counter)))
  •  Tags:  
  • r
  • Related