I have data on customers, their orders and products they ordered and I want to determine the return rate per customer. With count I can determine how many times each customer id has returned (1) and did not return (0) their item. However this is in a separate tibble. I want to include these return rates (number of times returned (1))/(total orders (1 0)) and link these to the customer id. How do I do this? The following is the code I used to get the customer id and returns, I just need to know how to link these back into the dataset.
returnrate <- data4 %>%
count(customer_id, returnyesno)
returnrate
with the following partial output
customer_id returnyesno n
<dbl> <dbl> <int>
1 212807 0 2
2 2108514 0 2
3 2593268 0 2
4 3155787 0 1
5 5769078 0 4
6 6059185 0 6
7 6059185 1 8
CodePudding user response:
Is this what you are looking for?
library(tidyverse)
df <- returnrate %>%
group_by(customer_id) %>%
mutate(return_ratio = sum(returnyesno * n)/ sum(n))
df
# A tibble: 7 x 4
# Groups: customer_id [6]
customer_id returnyesno n return_ratio
<int> <int> <int> <dbl>
1 212807 0 2 0
2 2108514 0 2 0
3 2593268 0 2 0
4 3155787 0 1 0
5 5769078 0 4 0
6 6059185 0 6 0.571
7 6059185 1 8 0.571
Then if you want to add this to the data4
table, you can run the following code:
data4 <- left_join(data4, df %>% select(customer_id, return_ratio) %>% unique())