Counting number of rows with values greater or equal to each value in a sequence-CodePudding

I'm trying to count the number of rows containing a value greater or equal to a number in a defined sequence and grouped on a second variable. For example the number of rows with values greater or equal to 300, 400, 500 grouped by Company A, Company B and Company C found in another column. In Excel I would just use the COUNTIFS function but I'd like not having to go via Excel for this task.

Example dataset and desired result:

library(tidyverse)

df <- tibble::tribble(
        ~Company, ~Sales,
             "B",   902L,
             "B",   575L,
             "C",   194L,
             "C",   215L,
             "A",   515L,
             "B",   728L,
             "A",   910L,
             "C",   889L,
             "A",   854L,
             "B",   230L,
             "C",   188L,
             "C",   442L,
             "A",   174L,
             "A",   723L,
             "B",   904L,
             "A",   761L,
             "B",   987L,
             "B",   521L,
             "B",   694L,
             "B",   530L,
             "C",   165L,
             "A",   507L,
             "B",   316L,
             "A",   452L,
             "A",   342L,
             "B",   413L,
             "B",   121L,
             "A",   650L,
             "B",   801L,
             "C",   100L
        )

result <- tibble::tribble(
             ~Company, ~Greater.or.equal.to, ~Count,
                  "A",                 300L,     9L,
                  "A",                 400L,     8L,
                  "A",                 500L,     7L,
                  "A",                 600L,     5L,
                  "A",                 700L,     4L,
                  "A",                 800L,     2L,
                  "A",                 900L,     1L,
                  "A",                1000L,     0L,
                  "B",                 300L,    11L,
                  "B",                 400L,    10L,
                  "B",                 500L,     9L,
                  "B",                 600L,     6L,
                  "B",                 700L,     5L,
                  "B",                 800L,     4L,
                  "B",                 900L,     3L,
                  "B",                1000L,     0L,
                  "C",                 300L,     2L,
                  "C",                 400L,     2L,
                  "C",                 500L,     1L,
                  "C",                 600L,     1L,
                  "C",                 700L,     1L,
                  "C",                 800L,     1L,
                  "C",                 900L,     0L,
                  "C",                1000L,     0L
             )

I know how to find individual rows using either base R or dplyr (much more familiar with the Tidyverse) but haven't been able to figure out a way to check for a sequence of values. I've tried creating a for loop in the hope of getting the right answer but is clearly doing something wrong.

# These two versions work but is inefficient for when we have a long sequence of variables to check against
length(which(df$Company == "A" & df$Sales >= 300))

df %>% 
  group_by(Company) %>% 
  summarise(count = sum(Sales >= 300))

# Attempt at a loop
# Sequence of values to loop over. The number sequence can change as the column we're checking
# against are changing
sequence <- seq(300, 1000, 100)
companies <- c("A", "B", "C")

counting <- function(data, col1, col2, range1, range2){
  for (i in range1){
    for (j in range2){
      length(which(data$col1 == i & data$col2 >= j))
    }
  }
}

counting(df, Company, Sales, companies, sequence)

Any advice is greatly appreciated!

CodePudding user response：

We may loop over the seq from 300 to 1000 by 100, filter the data after grouping by 'Company' using the looped value, create a summarised column with number of rows (n()), bind the list elements and use complete to fill the missing combinations with '0' for 'Count' column

library(dplyr)
library(purrr)
library(tidyr)
out <- map(seq(300, 1000, by = 100), ~ 
      df %>%
        group_by(Company) %>% 
        filter(Sales >= .x) %>% 
       summarise(Greater.or.equal.to = .x, Count = n())) %>%
     bind_rows %>%
     complete(Company, Greater.or.equal.to = seq(300, 1000,
               by = 100), fill = list(Count = 0))

-output

out
# A tibble: 24 × 3
   Company Greater.or.equal.to Count
   <chr>                 <dbl> <dbl>
 1 A                       300     9
 2 A                       400     8
 3 A                       500     7
 4 A                       600     5
 5 A                       700     4
 6 A                       800     2
 7 A                       900     1
 8 A                      1000     0
 9 B                       300    11
10 B                       400    10
# … with 14 more rows
> all.equal(out, result)
[1] TRUE