Rollapply percentage from logical conditions (Rolling rate in R )-CodePudding

I have a data frame in R with two columns with logical conditions that looks like this :

check1 = as.logical(c(rep("TRUE",3),rep("FALSE",2),rep("TRUE",3),rep("FALSE",2)))
check2 = as.logical(c(rep("TRUE",5),rep("FALSE",2),rep("TRUE",3)))
dat = cbind(check1,check2)

resulting to :

    check1 check2
 [1,]   TRUE   TRUE
 [2,]   TRUE   TRUE
 [3,]   TRUE   TRUE
 [4,]  FALSE   TRUE
 [5,]  FALSE   TRUE
 [6,]   TRUE  FALSE
 [7,]   TRUE  FALSE
 [8,]   TRUE   TRUE
 [9,]  FALSE   TRUE
[10,]  FALSE   TRUE

I want to roll calculate the percentage of TRUEs on each column which ideally must look like this :

check1	check2
1/1	1/1
2/2	2/2
3/3	3/3
3/4	4/4
3/5	5/5
4/6	5/6
5/7	5/7
6/8	6/8
6/9	7/9
6/10	8/10

maybe ...

dat%>%
  mutate(cumsum(check1)/seq_along(check1))

Any help ?

CodePudding user response：

You are almost there; just use across to apply your function to both columns.

Alternatively, you can use dplyr::cummean to compute the running proportions.

A note about terminology: rolling usually refers to computing a statistic (such as the mean or the max) within a fixed-size window. On the other hand, cumulative statistics are computed in an ever-increasig window starting from index 1 (or the first row). See the vignette on window functions. Using the right term may help you to search the documentation for the appropriate function.

library("tidyverse")

check1 <- as.logical(c(rep("TRUE", 3), rep("FALSE", 2), rep("TRUE", 3), rep("FALSE", 2)))
check2 <- as.logical(c(rep("TRUE", 5), rep("FALSE", 2), rep("TRUE", 3)))
dat <- cbind(check1, check2)

cummeans <- as_tibble(dat) %>%
  mutate(
    across(c(check1, check2), ~ cumsum(.) / row_number())
  )

cummeans <- as_tibble(dat) %>%
  mutate(
    across(c(check1, check2), cummean)
  )
cummeans
#> # A tibble: 10 × 2
#>    check1 check2
#>     <dbl>  <dbl>
#>  1  1      1    
#>  2  1      1    
#>  3  1      1    
#>  4  0.75   1    
#>  5  0.6    1    
#>  6  0.667  0.833
#>  7  0.714  0.714
#>  8  0.75   0.75 
#>  9  0.667  0.778
#> 10  0.6    0.8

# Plot the cumulative proportions on the y-axis, with one panel for each check
cummeans %>%
  # The example data has no index column; will use the row ids instead
  rowid_to_column() %>%
  pivot_longer(
    c(check1, check2),
    names_to = "check",
    values_to = "cummean"
  ) %>%
  ggplot(
    aes(rowid, cummean, color = check)
  )  
  geom_line()  
  # Proportions have a natural range from 0 to 1
  scale_y_continuous(
    limits = c(0, 1)
  )

^{Created on 2022-03-14 by the reprex package (v2.0.1)}

CodePudding user response：

This gives the result as fractions.

library(zoo)

rollapplyr(dat, 1:nrow(dat), mean)
##          check1    check2
##  [1,] 1.0000000 1.0000000
##  [2,] 1.0000000 1.0000000
##  [3,] 1.0000000 1.0000000
##  [4,] 0.7500000 1.0000000
##  [5,] 0.6000000 1.0000000
##  [6,] 0.6666667 0.8333333
##  [7,] 0.7142857 0.7142857
##  [8,] 0.7500000 0.7500000
##  [9,] 0.6666667 0.7777778
## [10,] 0.6000000 0.8000000

To get a percentage multiply that by 100:

100 * rollapplyr(dat, 1:nrow(dat), mean)