Home > Enterprise >  R - Count consecutive occurrences of a specific number based on a specific group
R - Count consecutive occurrences of a specific number based on a specific group

Time:07-26

In R say I had the dataframe:

frame object  positive     
1     6       0    
2     6       1    
3     6       1    
4     6       1    
5     6       1      
6     6       0    
7     6       0    
8     6       1 
9     6       1   
10    6       1
1     7       1    
2     7       1   
3     7       1    
4     7       1   
5     7       1      
6     7       0    
7     7       1       
8     7       0    
9     7       1    
10    7       1

I am trying to create a new table which counts the consecutive occurrences of the value of 1 in the positive column for each separate object and outputs the maximum and mean consecutive occurrences. Which would look like :

object  max  mean 
6       4    3.5
7       5    8/3

Thank you for your help!

CodePudding user response:

Here is a solution which uses data.table::rleid to find consecutive occurrences of 1s.

library("tidyverse")

df <- tibble::tribble(
  ~frame, ~object, ~positive,
  1L, 6L, 0L,
  2L, 6L, 1L,
  3L, 6L, 1L,
  4L, 6L, 1L,
  5L, 6L, 1L,
  6L, 6L, 0L,
  7L, 6L, 0L,
  8L, 6L, 1L,
  9L, 6L, 1L,
  10L, 6L, 1L,
  1L, 7L, 1L,
  2L, 7L, 1L,
  3L, 7L, 1L,
  4L, 7L, 1L,
  5L, 7L, 1L,
  6L, 7L, 0L,
  7L, 7L, 1L,
  8L, 7L, 0L,
  9L, 7L, 1L,
  10L, 7L, 1L
)
df %>%
  group_by(object) %>%
  mutate(
    sequence = data.table::rleid(positive == 1),
  ) %>%
  filter(
    positive == 1
  ) %>%
  group_by(
    object, sequence
  ) %>%
  summarise(
    length = n()
  ) %>%
  summarise(
    max = max(length),
    mean = mean(length)
  )
#> `summarise()` has grouped output by 'object'. You can override using the
#> `.groups` argument.
#> # A tibble: 2 × 3
#>   object   max  mean
#>    <int> <int> <dbl>
#> 1      6     4  3.5 
#> 2      7     5  2.67

Created on 2022-07-26 by the reprex package (v2.0.1)

CodePudding user response:

I created my own data so the output won't be exactly what you showed. Nevertheless it should do the trick.

library(dplyr)
sat.seed(111)
df <- data.frame(frame=c(1:10,1:10),
                 object=rep(6:7, each=10),
                 positive=sample(0:1,20, replace=T))
df

   frame object positive
1      1      6        1
2      2      6        1
3      3      6        1
4      4      6        0
5      5      6        1
6      6      6        0
7      7      6        0
8      8      6        0
9      9      6        1
10    10      6        1
11     1      7        1
12     2      7        0
13     3      7        1
14     4      7        0
15     5      7        0
16     6      7        1
17     7      7        0
18     8      7        0
19     9      7        0
20    10      7        1    

df %>% group_by(object) %>%  summarise(mean=mean(rle(positive)$lengths[rle(positive)$values==1]) ,
    max=max(rle(positive)$lengths[rle(positive)$values==1]))

# A tibble: 2 × 3
  object  mean   max
   <int> <dbl> <int>
1      6     2     3
2      7     1     1
  • Related