I'm trying to get the product_id, date and the total duplicates from a table. This table contains the id, product_id, price, wholesale_price, date_add and some more fields. Currently there are over a million of rows with products added to lookup the price history of a product. But some products have double values with the product_id and the date. For example:
id_price_history | id_product | price | date_add |
---|---|---|---|
1 | 1 | 6.95 | 2021-11-15 12:12:15 |
2 | 1 | 6.95 | 2021-11-15 12:14:25 |
3 | 2 | 4.95 | 2021-11-16 10:18:45 |
4 | 2 | 4.95 | 2021-11-16 11:18:55 |
5 | 3 | 8.95 | 2021-12-15 14:12:15 |
6 | 1 | 7.95 | 2021-12-16 12:14:25 |
7 | 1 | 7.95 | 2021-12-16 12:14:25 |
8 | 1 | 7.95 | 2021-12-16 12:14:25 |
9 | 4 | 29.95 | 2021-12-16 14:12:16 |
10 | 5 | 12.50 | 2021-12-16 14:12:17 |
11 | 1 | 6.95 | 2021-12-17 14:12:20 |
12 | 2 | 5.95 | 2021-12-17 14:12:22 |
So what I would like to have as output:
id_product | date | cnt |
---|---|---|
1 | 2021-11-15 | 2 |
2 | 2021-11-16 | 2 |
1 | 2021-12-16 | 3 |
I have tried different things with GROUP BY, HAVING COUNT() like:
SELECT id_product, date_add, COUNT(*) as total FROM price_history
GROUP BY id_product HAVING COUNT(DISTINCT date_add) > 1;
Or:
SELECT ph.id_product, date_add, COUNT(*) AS cnt FROM price_history ph
JOIN (SELECT id_product FROM price_history HAVING COUNT(DATE_FORMAT(date_add, '%Y-%m-%d')) > 1) b
ON ph.id_product = b.id_product LIMIT 20;
CodePudding user response:
I believe that you're trying to achieve: (though I kept price in the grouping)
SELECT id_product, price, date(date_add) AS 'date', count(*) AS total
FROM price_history
GROUP BY id_product, price, date(date_add) HAVING count(*) > 1;
...which gives your desired output:
id_product price date total
1 6.95 2021-11-15 2
2 4.95 2021-11-16 2
1 7.95 2021-12-16 3
Example dbfiddle.