I have a table with the following schema:
CREATE TABLE IF NOT EXISTS itemreviews(
itemid INT NOT NULL,
reviewid INT NOT NULL,
reviewrating INT NOT NULL,
reviewsummary VARCHAR(256) NOT NULL,
PRIMARY KEY(itemid, reviewid)
);
Here's an example of some data in this table:
-------- ---------- -------------- ----------------------
| itemId | reviewId | reviewRating | reviewSummary |
-------- ---------- -------------- ----------------------
| 01 | VM2LK | 5 | Nice item! Loved it. |
| 01 | P4KF2 | 3 | Meh, it's okay. |
| 02 | 3IK42 | 1 | Horrible, hated it. |
| 02 | KDXX2 | 3 | It's fine. |
| 03 | 3KXXZ | 4 | I liked it! |
| 03 | KQ23S | 3 | It's okay. |
-------- ---------- -------------- ----------------------
I was tasked with writing a query to sort items by the average of their ratings.
So, using the same dataset above, the result of that query would simply be:
-------- ---------
| itemid | average |
-------- ---------
| 01 | 4 | // item 01 had ratings 5 and 3 so (5 3) / number of ratings = 4
| 03 | 3.5 | // item 03 had ratings 4 and 3 so (4 3) / number of ratings = 3.5
| 02 | 2 | // item 02 had ratings 1 and 3 so (1 3) / number of ratings = 2
-------- ---------
I am currently using this query to achieve this result:
SELECT itemid, AVG(reviewRating) as average FROM itemreviews
GROUP BY itemid
ORDER BY average DESC
The problem, however, is that I have another requirement:
We should only include, in the result, items that have at least n
number of reviews.
In the example above, all items have 2 reviews equally.
If, suppose, item 03 had only 1 review, and that I only want items with at least 2 reviews, then item 03 should be excluded. Only item 01 and 02 would be sorted:
-------- ---------
| itemid | average |
-------- ---------
| 01 | 4 | // item 01 had ratings 5 and 3 so (5 3) / number of ratings = 4
| 02 | 2 | // item 02 had ratings 1 and 3 so (1 3) / number of ratings = 2
// item 03 is no more, since it only had one review
-------- ---------
I have no idea how to apply this requirement.
I am guessing that I need to group by itemid
, then somehow use COUNT
to find the number of reviews an item has, and then filter those that don't have at least n
reviews, and only then sort it.
But I am already calculating the average, which, I presume, already "counts" internally. I am worried about performance as well.
If anyone can give me any pointers, I'd be very thankful.
CodePudding user response:
As you already Grouping you can use HAVING COUNT(*)
to get number of rows
SELECT itemid, AVG(reviewRating) as average FROM itemreviews
GROUP BY itemid
HAVING COUNT(*) > 1
ORDER BY average DESC