Home > Back-end >  Counting values accordingly to conditions, but in blocks of 3 rows
Counting values accordingly to conditions, but in blocks of 3 rows

Time:11-17

I have an R data frame:

a <-1:6
x <- c(0.2,3.7,4.6,1.1,2.2,5.3)
df1 <- data.frame(a,x)

df1
  a   x
1 1  0.2
2 2  3.7
3 3  4.6
4 4  1.1
5 5  2.2
6 6  5.3

I would like to compute how many times the value in "x" is >3.2. However I want to calculate it for groups of 3 rows. So for example I group the rows in "block1" (rows 1,2,3) and "block 2" (rows 4,5,6), and I want to know how many times "x" value is >3.2 in each block rather than in the singular rows, obtaining something like:

block  count
 1      2 
 2      1
       

In my original data I have much more entries, so ideally I need a command to create consecutive blocks of 3 rows, instead of creating the blocks manually.

Can you advice how to compute this? I have been suggested to use plyr, but I have not been successful.

I hope this is clear enough.

Thank you!

CodePudding user response:

> df1 %>% 
    mutate(block = rep(1:2, each=3) %>% factor) %>% 
    group_by(block) %>% 
    summarise(count = sum(x>3.2))
# A tibble: 2 x 2
  block count
  <fct> <int>
1 1         2
2 2         1

CodePudding user response:

df1 %>% 
  mutate(block = as.integer(gl(n(), 3, n()))) %>% 
  group_by(block) %>% 
  summarise(count = sum(x>3.2))
  block count
  <int> <int>
1     1     2
2     2     1

CodePudding user response:

A one-liner:

data.frame(block = 1:(length(x)/3), count = rowSums(matrix(df1$x, ncol = 3, byrow = TRUE) > 3.2))
#>   block count
#> 1     1     2
#> 2     2     1

UPDATE: Using a wide matrix is somewhat faster:

data.frame(block = 1:(length(x)/3), count = colSums(matrix(df1$x, nrow = 3) > 3.2))
#>   block count
#> 1     1     2
#> 2     2     1

x <- runif(6e5, 0, 5)
microbenchmark::microbenchmark(byrow = data.frame(block = 1:(length(x)/3), count = rowSums(matrix(x, ncol = 3, byrow = TRUE) > 3.2)),
                               bycol = data.frame(block = 1:(length(x)/3), count = colSums(matrix(x, nrow = 3) > 3.2)))
#> Unit: milliseconds
#>   expr      min       lq     mean   median        uq      max neval
#>  byrow 5.121082 5.410939 9.250643 7.526198 10.409384 69.91496   100
#>  bycol 3.280639 3.539040 5.536414 5.320978  6.279585 12.38163   100

CodePudding user response:

A base R option using aggregate

> aggregate(cbind(count = x>3.2)~cbind(block = ceiling(seq_along(a)/3)),df1,sum)
  block count
1     1     2
2     2     1
  •  Tags:  
  • r
  • Related