I'm trying to count the number of rows containing a value greater or equal to a number in a defined sequence and grouped on a second variable. For example the number of rows with values greater or equal to 300, 400, 500 grouped by Company A, Company B and Company C found in another column. In Excel I would just use the COUNTIFS function but I'd like not having to go via Excel for this task.
Example dataset and desired result:
library(tidyverse)
df <- tibble::tribble(
~Company, ~Sales,
"B", 902L,
"B", 575L,
"C", 194L,
"C", 215L,
"A", 515L,
"B", 728L,
"A", 910L,
"C", 889L,
"A", 854L,
"B", 230L,
"C", 188L,
"C", 442L,
"A", 174L,
"A", 723L,
"B", 904L,
"A", 761L,
"B", 987L,
"B", 521L,
"B", 694L,
"B", 530L,
"C", 165L,
"A", 507L,
"B", 316L,
"A", 452L,
"A", 342L,
"B", 413L,
"B", 121L,
"A", 650L,
"B", 801L,
"C", 100L
)
result <- tibble::tribble(
~Company, ~Greater.or.equal.to, ~Count,
"A", 300L, 9L,
"A", 400L, 8L,
"A", 500L, 7L,
"A", 600L, 5L,
"A", 700L, 4L,
"A", 800L, 2L,
"A", 900L, 1L,
"A", 1000L, 0L,
"B", 300L, 11L,
"B", 400L, 10L,
"B", 500L, 9L,
"B", 600L, 6L,
"B", 700L, 5L,
"B", 800L, 4L,
"B", 900L, 3L,
"B", 1000L, 0L,
"C", 300L, 2L,
"C", 400L, 2L,
"C", 500L, 1L,
"C", 600L, 1L,
"C", 700L, 1L,
"C", 800L, 1L,
"C", 900L, 0L,
"C", 1000L, 0L
)
I know how to find individual rows using either base R or dplyr (much more familiar with the Tidyverse) but haven't been able to figure out a way to check for a sequence of values. I've tried creating a for loop in the hope of getting the right answer but is clearly doing something wrong.
# These two versions work but is inefficient for when we have a long sequence of variables to check against
length(which(df$Company == "A" & df$Sales >= 300))
df %>%
group_by(Company) %>%
summarise(count = sum(Sales >= 300))
# Attempt at a loop
# Sequence of values to loop over. The number sequence can change as the column we're checking
# against are changing
sequence <- seq(300, 1000, 100)
companies <- c("A", "B", "C")
counting <- function(data, col1, col2, range1, range2){
for (i in range1){
for (j in range2){
length(which(data$col1 == i & data$col2 >= j))
}
}
}
counting(df, Company, Sales, companies, sequence)
Any advice is greatly appreciated!
CodePudding user response:
We may loop over the seq
from 300 to 1000 by
100, filter
the data after grouping by 'Company' using the looped value, create a summarise
d column with number of rows (n()
), bind the list
elements and use complete
to fill
the missing combinations with '0' for 'Count' column
library(dplyr)
library(purrr)
library(tidyr)
out <- map(seq(300, 1000, by = 100), ~
df %>%
group_by(Company) %>%
filter(Sales >= .x) %>%
summarise(Greater.or.equal.to = .x, Count = n())) %>%
bind_rows %>%
complete(Company, Greater.or.equal.to = seq(300, 1000,
by = 100), fill = list(Count = 0))
-output
out
# A tibble: 24 × 3
Company Greater.or.equal.to Count
<chr> <dbl> <dbl>
1 A 300 9
2 A 400 8
3 A 500 7
4 A 600 5
5 A 700 4
6 A 800 2
7 A 900 1
8 A 1000 0
9 B 300 11
10 B 400 10
# … with 14 more rows
> all.equal(out, result)
[1] TRUE