I have an R data frame:
a <-1:6
x <- c(0.2,3.7,4.6,1.1,2.2,5.3)
df1 <- data.frame(a,x)
df1
a x
1 1 0.2
2 2 3.7
3 3 4.6
4 4 1.1
5 5 2.2
6 6 5.3
I would like to compute how many times the value in "x" is >3.2. However I want to calculate it for groups of 3 rows. So for example I group the rows in "block1" (rows 1,2,3) and "block 2" (rows 4,5,6), and I want to know how many times "x" value is >3.2 in each block rather than in the singular rows, obtaining something like:
block count
1 2
2 1
In my original data I have much more entries, so ideally I need a command to create consecutive blocks of 3 rows, instead of creating the blocks manually.
Can you advice how to compute this? I have been suggested to use plyr, but I have not been successful.
I hope this is clear enough.
Thank you!
CodePudding user response:
> df1 %>%
mutate(block = rep(1:2, each=3) %>% factor) %>%
group_by(block) %>%
summarise(count = sum(x>3.2))
# A tibble: 2 x 2
block count
<fct> <int>
1 1 2
2 2 1
CodePudding user response:
df1 %>%
mutate(block = as.integer(gl(n(), 3, n()))) %>%
group_by(block) %>%
summarise(count = sum(x>3.2))
block count
<int> <int>
1 1 2
2 2 1
CodePudding user response:
A one-liner:
data.frame(block = 1:(length(x)/3), count = rowSums(matrix(df1$x, ncol = 3, byrow = TRUE) > 3.2))
#> block count
#> 1 1 2
#> 2 2 1
UPDATE: Using a wide matrix is somewhat faster:
data.frame(block = 1:(length(x)/3), count = colSums(matrix(df1$x, nrow = 3) > 3.2))
#> block count
#> 1 1 2
#> 2 2 1
x <- runif(6e5, 0, 5)
microbenchmark::microbenchmark(byrow = data.frame(block = 1:(length(x)/3), count = rowSums(matrix(x, ncol = 3, byrow = TRUE) > 3.2)),
bycol = data.frame(block = 1:(length(x)/3), count = colSums(matrix(x, nrow = 3) > 3.2)))
#> Unit: milliseconds
#> expr min lq mean median uq max neval
#> byrow 5.121082 5.410939 9.250643 7.526198 10.409384 69.91496 100
#> bycol 3.280639 3.539040 5.536414 5.320978 6.279585 12.38163 100
CodePudding user response:
A base R option using aggregate
> aggregate(cbind(count = x>3.2)~cbind(block = ceiling(seq_along(a)/3)),df1,sum)
block count
1 1 2
2 2 1