Home > OS >  How to find the average total per order in R
How to find the average total per order in R

Time:02-14

I'm trying to find the average total of an order across the entire data set. Here is what a little bit of the data set looks like:

order_id qty price discount
20752 1 58 5
20753 1 34 0
20753 1 64 0
20753 1 78 0
20754 2 34 0
20755 1 62 0

What I'm looking to do is find the average total of each order.

I'm able to find the average of each order by using this code but I'm not sure how to find the average of each order across the entire data set.

'''
project %>%

    group_by(order_id) %>% 

    summarise(Avg_total = mean(price, na.rm=TRUE))
'''

This gives me the following table:

order_id Avg_total
20752 58
20753 58.7
20754 34
20755 62

However, what I want is this:

Avg_total
89.75

Here is how I calculated that number -

order 20752 has a total of 53

order 20753 has a total of 176

order 20754 has a total of 68

order 20755 has a total of 62

53 176 68 62 = 359 / 4 (# of orders) = 89.75 average total per order across the whole set.

If possible, I would also like to filter out any order totals above 500 to remove any outliers.

CodePudding user response:

So, you have the following code:

Book1 %>%
  group_by(Book1$order_id)%>%
  summarise(Avg_total=mean(price, na.rm=TRUE))

which gives you the grouping summarizing it by average of price for each order_id (e.g. 20752 -> 58/1 = 58, 20753 -> (34 64 78)/3 = 58.66666 = 58.7)

Group_by and summarize output

For your case I would first create a new column with the values price-discount as follows:

Book1$final_price = Book1$price-Book1$discount

The code simply subtracts the discount from price and adds it to a new column called final_price. R handles the operation row by row.

Which gives the following result:

new column

This, by following the code that you shared gives a different table (meaning that the discount is removed from the price):

    Book1 %>%
  group_by(Book1$order_id)%>%
  summarise(Avg_total=mean(final_price, na.rm=TRUE))

new summarization

What you are having though is 58 (34 64 78) 34 62 = 330 (for not removing discount) or 53 (34 64 78) 34 62 = 325 (by removing discount).

Therefore you mean would be (53 176 34 62)/4 = 81.25

You could find the desired output:

        Book1%>%
  group_by(Book1$order_id)%>%
  summarise(Sum_total=sum(final_price, na.rm=TRUE))%>%
  summarise(Avg = mean(Sum_total, na.rm=TRUE))

What happens here is that we are first grouping by the order_id, then creating the sum (and not mean) for every order_id and then doing the mean.

Please be careful, this finds the mean of totals for order_ids and not the mean of totals for every order, which should be the following:

    Book1%>%
  summarise(Avg = mean(price, na.rm=TRUE))

In order to remove rows you could run the following before any arithmetic operations:

x <-subset(Book1, price<=500)

CodePudding user response:

I was able to actually able to find the answer using this code

Avg_order <- project %>% 
  group_by(order_id) %>% 
  summarise(Sum_total= ((price * qty) - discount), na.rm=TRUE) %>% 
  summarise(Avg = mean(Sum_total, na.rm=TRUE))

Avg_order %>% 
  summarise(avg_price_per_order = sum(Avg) / n_distinct(order_id))
  • Related