I'm trying to find the average total of an order across the entire data set. Here is what a little bit of the data set looks like:
order_id | qty | price | discount |
---|---|---|---|
20752 | 1 | 58 | 5 |
20753 | 1 | 34 | 0 |
20753 | 1 | 64 | 0 |
20753 | 1 | 78 | 0 |
20754 | 2 | 34 | 0 |
20755 | 1 | 62 | 0 |
What I'm looking to do is find the average total of each order.
I'm able to find the average of each order by using this code but I'm not sure how to find the average of each order across the entire data set.
'''
project %>%
group_by(order_id) %>%
summarise(Avg_total = mean(price, na.rm=TRUE))
'''
This gives me the following table:
order_id | Avg_total |
---|---|
20752 | 58 |
20753 | 58.7 |
20754 | 34 |
20755 | 62 |
However, what I want is this:
Avg_total |
---|
89.75 |
Here is how I calculated that number -
order 20752 has a total of 53
order 20753 has a total of 176
order 20754 has a total of 68
order 20755 has a total of 62
53 176 68 62 = 359 / 4 (# of orders) = 89.75 average total per order across the whole set.
If possible, I would also like to filter out any order totals above 500 to remove any outliers.
CodePudding user response:
So, you have the following code:
Book1 %>%
group_by(Book1$order_id)%>%
summarise(Avg_total=mean(price, na.rm=TRUE))
which gives you the grouping summarizing it by average of price for each order_id (e.g. 20752 -> 58/1 = 58, 20753 -> (34 64 78)/3 = 58.66666 = 58.7)
For your case I would first create a new column with the values price-discount as follows:
Book1$final_price = Book1$price-Book1$discount
The code simply subtracts the discount from price and adds it to a new column called final_price. R handles the operation row by row.
Which gives the following result:
This, by following the code that you shared gives a different table (meaning that the discount is removed from the price):
Book1 %>%
group_by(Book1$order_id)%>%
summarise(Avg_total=mean(final_price, na.rm=TRUE))
What you are having though is 58 (34 64 78) 34 62 = 330 (for not removing discount) or 53 (34 64 78) 34 62 = 325 (by removing discount).
Therefore you mean would be (53 176 34 62)/4 = 81.25
You could find the desired output:
Book1%>%
group_by(Book1$order_id)%>%
summarise(Sum_total=sum(final_price, na.rm=TRUE))%>%
summarise(Avg = mean(Sum_total, na.rm=TRUE))
What happens here is that we are first grouping by the order_id, then creating the sum (and not mean) for every order_id and then doing the mean.
Please be careful, this finds the mean of totals for order_ids and not the mean of totals for every order, which should be the following:
Book1%>%
summarise(Avg = mean(price, na.rm=TRUE))
In order to remove rows you could run the following before any arithmetic operations:
x <-subset(Book1, price<=500)
CodePudding user response:
I was able to actually able to find the answer using this code
Avg_order <- project %>%
group_by(order_id) %>%
summarise(Sum_total= ((price * qty) - discount), na.rm=TRUE) %>%
summarise(Avg = mean(Sum_total, na.rm=TRUE))
Avg_order %>%
summarise(avg_price_per_order = sum(Avg) / n_distinct(order_id))