I have a data frame in R with two columns with logical conditions that looks like this :
check1 = as.logical(c(rep("TRUE",3),rep("FALSE",2),rep("TRUE",3),rep("FALSE",2)))
check2 = as.logical(c(rep("TRUE",5),rep("FALSE",2),rep("TRUE",3)))
dat = cbind(check1,check2)
resulting to :
check1 check2
[1,] TRUE TRUE
[2,] TRUE TRUE
[3,] TRUE TRUE
[4,] FALSE TRUE
[5,] FALSE TRUE
[6,] TRUE FALSE
[7,] TRUE FALSE
[8,] TRUE TRUE
[9,] FALSE TRUE
[10,] FALSE TRUE
I want to roll calculate the percentage of TRUEs on each column which ideally must look like this :
check1 | check2 |
---|---|
1/1 | 1/1 |
2/2 | 2/2 |
3/3 | 3/3 |
3/4 | 4/4 |
3/5 | 5/5 |
4/6 | 5/6 |
5/7 | 5/7 |
6/8 | 6/8 |
6/9 | 7/9 |
6/10 | 8/10 |
maybe ...
dat%>%
mutate(cumsum(check1)/seq_along(check1))
Any help ?
CodePudding user response:
You are almost there; just use across
to apply your function to both columns.
Alternatively, you can use dplyr::cummean
to compute the running proportions.
A note about terminology: rolling usually refers to computing a statistic (such as the mean or the max) within a fixed-size window. On the other hand, cumulative statistics are computed in an ever-increasig window starting from index 1 (or the first row). See the vignette on window functions. Using the right term may help you to search the documentation for the appropriate function.
library("tidyverse")
check1 <- as.logical(c(rep("TRUE", 3), rep("FALSE", 2), rep("TRUE", 3), rep("FALSE", 2)))
check2 <- as.logical(c(rep("TRUE", 5), rep("FALSE", 2), rep("TRUE", 3)))
dat <- cbind(check1, check2)
cummeans <- as_tibble(dat) %>%
mutate(
across(c(check1, check2), ~ cumsum(.) / row_number())
)
cummeans <- as_tibble(dat) %>%
mutate(
across(c(check1, check2), cummean)
)
cummeans
#> # A tibble: 10 × 2
#> check1 check2
#> <dbl> <dbl>
#> 1 1 1
#> 2 1 1
#> 3 1 1
#> 4 0.75 1
#> 5 0.6 1
#> 6 0.667 0.833
#> 7 0.714 0.714
#> 8 0.75 0.75
#> 9 0.667 0.778
#> 10 0.6 0.8
# Plot the cumulative proportions on the y-axis, with one panel for each check
cummeans %>%
# The example data has no index column; will use the row ids instead
rowid_to_column() %>%
pivot_longer(
c(check1, check2),
names_to = "check",
values_to = "cummean"
) %>%
ggplot(
aes(rowid, cummean, color = check)
)
geom_line()
# Proportions have a natural range from 0 to 1
scale_y_continuous(
limits = c(0, 1)
)
Created on 2022-03-14 by the reprex package (v2.0.1)
CodePudding user response:
This gives the result as fractions.
library(zoo)
rollapplyr(dat, 1:nrow(dat), mean)
## check1 check2
## [1,] 1.0000000 1.0000000
## [2,] 1.0000000 1.0000000
## [3,] 1.0000000 1.0000000
## [4,] 0.7500000 1.0000000
## [5,] 0.6000000 1.0000000
## [6,] 0.6666667 0.8333333
## [7,] 0.7142857 0.7142857
## [8,] 0.7500000 0.7500000
## [9,] 0.6666667 0.7777778
## [10,] 0.6000000 0.8000000
To get a percentage multiply that by 100:
100 * rollapplyr(dat, 1:nrow(dat), mean)