Home > Back-end >  R, find average length of consecutive time-steps in data.frame
R, find average length of consecutive time-steps in data.frame

Time:03-29

I have the following data.frame with time column sorted in ascending order:

colA=c(1,2,5,6,7,10,13,16,19,20,25,40,43,44,50,51,52,53,68,69,77,79,81,82)
colB=rnorm(24)

df=data.frame(time=colA, x=colB)

How can I count and take the average of the consecutive time-steps observed in the time column?

In detail, I need to group the rows in the time column by consecutive observations, e.g. 1,2 and 5,6,7 and 19,20 and 43,44, etc... and then take the average of the length of each group.

CodePudding user response:

You can group clusters of consecutive observations like this:

df$group <- c(0, cumsum(diff(df$time) != 1))   1

Which gives:

df
#>    time          x group
#> 1     1  0.7443742     1
#> 2     2  0.1289818     1
#> 3     5  1.4882743     2
#> 4     6 -0.6626820     2
#> 5     7 -1.1606550     2
#> 6    10  0.3587742     3
#> 7    13 -0.1948464     4
#> 8    16 -0.2952820     5
#> 9    19  0.4966404     6
#> 10   20  0.4849128     6
#> 11   25  0.0187845     7
#> 12   40  0.6347746     8
#> 13   43  0.7544441     9
#> 14   44  0.8335890     9
#> 15   50  0.9657613    10
#> 16   51  1.2938800    10
#> 17   52 -0.1365510    10
#> 18   53 -0.4401387    10
#> 19   68 -1.2272839    11
#> 20   69 -0.2376531    11
#> 21   77 -0.9268582    12
#> 22   79  0.4112354    13
#> 23   81 -0.1988646    14
#> 24   82 -0.5574496    14

You can get the length of these groups by doing:

rle(df$group)$lengths
#> [1] 2 3 1 1 1 2 1 1 2 4 2 1 1 2

And the average length of the consecutive groups is:

mean(rle(df$group)$lengths)
#> [1] 1.714286

And the average of x within each group using

tapply(df$x, df$group, mean)
#>          1          2          3          4          5          6          7 
#>  0.4366780 -0.1116876  0.3587742 -0.1948464 -0.2952820  0.4907766  0.0187845 
#>          8          9         10         11         12         13         14 
#>  0.6347746  0.7940166  0.4207379 -0.7324685 -0.9268582  0.4112354 -0.3781571 

CodePudding user response:

Another way, doing some kind of consecutive length encoding:

l = diff(c(0, which(diff(df$time) != 1), nrow(df)))
l
# [1] 2 3 1 1 1 2 1 1 2 4 2 1 1 2

mean(l)
#[1] 1.714286
  • Related