Home > Net >  Average of rows grouped by id, but only if COUNT of grouped rows are > than N
Average of rows grouped by id, but only if COUNT of grouped rows are > than N

Time:08-01

I have a table with the following schema:

CREATE TABLE IF NOT EXISTS itemreviews(
  itemid INT NOT NULL,
  reviewid INT NOT NULL,
  reviewrating INT NOT NULL,
  reviewsummary VARCHAR(256) NOT NULL,
  PRIMARY KEY(itemid, reviewid)
);

Here's an example of some data in this table:

 -------- ---------- -------------- ---------------------- 
| itemId | reviewId | reviewRating |    reviewSummary     |
 -------- ---------- -------------- ---------------------- 
|     01 | VM2LK    |            5 | Nice item! Loved it. |
|     01 | P4KF2    |            3 | Meh, it's okay.      |
|     02 | 3IK42    |            1 | Horrible, hated it.  |
|     02 | KDXX2    |            3 | It's fine.           |
|     03 | 3KXXZ    |            4 | I liked it!          |
|     03 | KQ23S    |            3 | It's okay.           |
 -------- ---------- -------------- ---------------------- 

I was tasked with writing a query to sort items by the average of their ratings.

So, using the same dataset above, the result of that query would simply be:

 -------- --------- 
| itemid | average |
 -------- --------- 
|     01 |       4 | // item 01 had ratings 5 and 3 so (5   3) / number of ratings = 4
|     03 |     3.5 | // item 03 had ratings 4 and 3 so (4   3) / number of ratings = 3.5
|     02 |       2 | // item 02 had ratings 1 and 3 so (1   3) / number of ratings = 2
 -------- --------- 

I am currently using this query to achieve this result:

SELECT itemid, AVG(reviewRating) as average FROM itemreviews 
GROUP BY itemid
ORDER BY average DESC 

The problem, however, is that I have another requirement:

We should only include, in the result, items that have at least n number of reviews.

In the example above, all items have 2 reviews equally.

If, suppose, item 03 had only 1 review, and that I only want items with at least 2 reviews, then item 03 should be excluded. Only item 01 and 02 would be sorted:

 -------- --------- 
| itemid | average |
 -------- --------- 
|     01 |       4 | // item 01 had ratings 5 and 3 so (5   3) / number of ratings = 4
|     02 |       2 | // item 02 had ratings 1 and 3 so (1   3) / number of ratings = 2
// item 03 is no more, since it only had one review
 -------- --------- 

I have no idea how to apply this requirement.

I am guessing that I need to group by itemid, then somehow use COUNT to find the number of reviews an item has, and then filter those that don't have at least n reviews, and only then sort it.

But I am already calculating the average, which, I presume, already "counts" internally. I am worried about performance as well.

If anyone can give me any pointers, I'd be very thankful.

CodePudding user response:

As you already Grouping you can use HAVING COUNT(*) to get number of rows

SELECT itemid, AVG(reviewRating) as average FROM itemreviews 
GROUP BY itemid
HAVING COUNT(*) > 1
ORDER BY average DESC 
  • Related