I'm working with a data set that contains CustomerID, Sales_Rep, Product, and year columns. The problem I have with this dataset is that there is no unique Transaction Number. The data looks like this:
CustomerID Sales Rep Product Year
301978 Richard Grayson Product A 2017
302151 Maurin Thompkins Product B 2018
301962 Wallace West Product C 2019
301978 Richard Grayson Product B 2018
402152 Maurin Thompkins Product A 2017
501967 Wallace West Product B 2017
301978 Richard Grayson Product B 2018
What I'm trying to do is count how many transactions were made by each Sales Rep, per year by counting the number of Customer IDs that appear for each Sales Rep per year regardless if the customer ID is repeated, and then compile it into one data frame called "Count". I tried using the following functions in R:
Count <- Sales_Data %>%
group_by(Sales_Rep, year) %>%
summarize(count(CustomerID))
but I get this error:
Error: Problem with `summarise()` input `..1`.
i `..1 = count(PatientID)`.
x no applicable method for 'count' applied to an object of class "c('integer', 'numeric')"
The result I want to produce is this:
Sales Rep 2017 2018 2019
Richard Grayson 1 2
Maurin Thompkins 1 1
Wallace West 1 1
Can anybody help me?
CodePudding user response:
There is no need to group and summarise, function count
does that in one step. Then reshape to wide format.
Sales_Data <- read.table(text = "
CustomerID 'Sales Rep' Product Year
301978 'Richard Grayson' 'Product A' 2017
302151 'Maurin Thompkins' 'Product B' 2018
301962 'Wallace West' 'Product C' 2019
301978 'Richard Grayson' 'Product B' 2018
402152 'Maurin Thompkins' 'Product A' 2017
501967 'Wallace West' 'Product B' 2017
301978 'Richard Grayson' 'Product B' 2018
", header = TRUE, check.names = FALSE)
suppressPackageStartupMessages({
library(dplyr)
library(tidyr)
})
Sales_Data %>% count(CustomerID)
#> CustomerID n
#> 1 301962 1
#> 2 301978 3
#> 3 302151 1
#> 4 402152 1
#> 5 501967 1
Sales_Data %>%
count(`Sales Rep`, Year) %>%
pivot_wider(id_cols = `Sales Rep`, names_from = Year, values_from = n)
#> # A tibble: 3 x 4
#> `Sales Rep` `2017` `2018` `2019`
#> <chr> <int> <int> <int>
#> 1 Maurin Thompkins 1 1 NA
#> 2 Richard Grayson 1 2 NA
#> 3 Wallace West 1 NA 1
Created on 2022-04-03 by the reprex package (v2.0.1)