counter for unique values found along the vector-CodePudding

first of all let me say that I have searched a lot for this basic question, but none of the answers found seems to do the job. If this specific question has already an answer, please excuse me.

I want to count the occurrence of behaviours in my data.

mydata <- data.frame(BH=c(
"sniff","explore","walking","explore","walking","trotting","sniff","explore","trotting","trotting","walking","walking","walking","watch","walking","trotting","watch","walking","walking","walking"))

and the output has to be like this

myoutput <- data.frame(
BH=c(
"sniff","explore","walking","explore","walking","trotting","sniff","explore","trotting","trotting","walking","walking","walking","watch","walking","trotting","watch","walking","walking","walking"),
mycount=c(
1,2,3,3,3,4,4,4,4,4,4,4,4,
5,5,5,5,5,5,5))

I have experimented using ave and n_distinct from dplyr package, but I only get the count of a given behaviour, not the cumulative count.

Any help or hint on how to solve this problem would be appreciate.

Stef

CodePudding user response：

This is easy with a group-by operation and cumsum. I like using package data.table.

library(data.table)
setDT(mydata)
mydata[, mycount := c(1, rep(0, .N - 1)), by = BH] #first occurences
mydata[, mycount := cumsum(mycount)]

all.equal(setDF(mydata), myoutput)
#[1] TRUE

CodePudding user response：

Here is a solution with tidyverse - not as concise as Roland`s solution, but it works.

library(tidyverse)

x <- mydata |> 
  mutate(rn = row_number())
  
x |> 
  group_by(BH) |> 
  mutate(id = cur_group_id()) |> 
  ungroup() |> 
  pivot_wider(names_from = BH, 
              values_from = id, 
              values_fill = 0) |> 
  mutate(across(
    sniff:watch, ~ cumsum(.x) > 0, .names = "{.col}_temp"),
    mycount = rowSums(across(ends_with('_temp')))
    ) |>
  dplyr::select(c(rn:watch, mycount)) |> 
  right_join(x, by = 'rn') |> 
  pivot_longer(-c(rn, mycount, BH)) |> 
  filter(value !=0) |> 
  dplyr::select(BH, mycount)
#> # A tibble: 20 × 2
#>    BH       mycount
#>    <chr>      <dbl>
#>  1 sniff          1
#>  2 explore        2
#>  3 walking        3
#>  4 explore        3
#>  5 walking        3
#>  6 trotting       4
#>  7 sniff          4
#>  8 explore        4
#>  9 trotting       4
#> 10 trotting       4
#> 11 walking        4
#> 12 walking        4
#> 13 walking        4
#> 14 watch          5
#> 15 walking        5
#> 16 trotting       5
#> 17 watch          5
#> 18 walking        5
#> 19 walking        5
#> 20 walking        5