So currently, I have a table with the information about each person: the number of times this person visited a store, split by week.
Week | Customer |
---|---|
1 | A |
1 | A |
1 | B |
1 | C |
1 | D |
2 | A |
3 | B |
3 | G |
4 | A |
4 | A |
4 | K |
4 | C |
5 | A |
5 | B |
6 | A |
Here is the dataframe for that:
table <- data.frame(
week = c(1,1,1,1,1, 2,3,3,4,4,4,4,5,5,6),
client = c("A", "A", "B","C", "D", "A","B", "G","A","A","K","C","A","B","A")
)
I want to create a table that show the number of customers that coming back into the store, regardless about the number of times they come:
Week | #Customers visit at least 1-2 times | #Customers visit 3-4 times | #Customer visit 5 times |
---|---|---|---|
1 | 4 | 0 | 0 |
2 | 4 | 0 | 0 |
3 | 5 | 0 | 0 |
4 | 5 | 1 | 0 |
5 | 4 | 2 | 0 |
6 | 4 | 1 | 1 |
To explain further: the reason why week 2 is 4 is week 2 will know the information from week 1; week 3 will know the information from week 1 and 2 and so on. That is the reason why from week 4: customer A will move to the 3-4 times group as he/she has visited at week 1,2 and 4. I will not count the number of visits for each customer but just care about the whether that week this customer shows up or not.
I would love to know the way to do this one. Thank you so much!
I have tried dplyr function to group by each week, then do a sum of if else function. However, that did not turn out to be correct. It did not take into account the past information of the week.
table %>%
group_by(client) %>%
mutate(count = row_number()) %>%
ungroup() %>%
group_by(week) %>%
summarize(from_1_to_2 = sum(ifelse(count >= 1 & count <=2, 1, 0)),
from_3_to_4 = sum(ifelse(count >= 3 & count <= 4, 1, 0)),
from_5_or_more = sum(ifelse(count >= 5, 1, 0)))
CodePudding user response:
You need to expand to know for every week if a given customer visited or not. You can find the true weeks from you data above. You need to supplement that with the missing data, assuming they are false for that week.
weeklyVisits <- table |>
group_by(week, client) |>
summarize(visited = TRUE, .groups = "drop")
noVisitData <- purrr::map_dfr(
unique(table$client),
~data.frame(client = .x, week = seq(max(table$week)), visited = FALSE)
) |>
anti_join(weeklyVisits, by = c("week", "client"))
With this you can find the cumulative weeks each client visited
clientCumVisits <- weeklyVisits |>
bind_rows(noVisitData) |>
group_by(client) |>
arrange(week) |>
mutate(cumVisits = cumsum(visited)) |>
print()
Lastly, summarize with your buckets.
clientCumVisits |>
group_by(week) |>
summarize(
'1-2 Visits' = sum(cumVisits >= 1 & cumVisits <= 2),
'3-4 Visits' = sum(cumVisits >= 3 & cumVisits <= 4),
'5 Visits' = sum(cumVisits >= 5)
)