A friend of mine came up with the following query for finding the winning bets
SELECT b1.*
FROM bids as b1
LEFT JOIN bids AS b2 ON b1.item_id = b2.item_id AND b1.bid_price < b2.bid_price
WHERE b2.item_id IS NULL
It appears to be working correctly, but I don't understand how it works and whether it's by accident or the result will be always the same. Can someone please explain, how it works, and especially how does b1.bid_price < b2.bid_price
work with apparently non-existent b2.bid_price?
Here's the dump:
CREATE TABLE `bids` (
`bid_id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`bid_price` decimal(10,2) unsigned NOT NULL,
`user_id` int(11) unsigned NOT NULL,
`item_id` int(11) unsigned NOT NULL,
`bid_date_created` datetime NOT NULL DEFAULT current_timestamp(),
PRIMARY KEY (`bid_id`)
) ENGINE=InnoDB AUTO_INCREMENT=18 DEFAULT CHARSET=utf8mb4
INSERT INTO bids VALUES (1, 14000.33, 12, 1, '2021-10-27 19:07:21');
INSERT INTO bids VALUES (2, 13000.60, 6, 1, '2021-10-27 18:07:21');
INSERT INTO bids VALUES (3, 21000.00, 7, 4, '2021-10-29 18:07:21');
INSERT INTO bids VALUES (4, 17000.25, 6, 4, '2021-10-27 18:07:21');
INSERT INTO bids VALUES (6, 5500.00, 6, 7, '2021-11-21 11:46:17');
INSERT INTO bids VALUES (7, 1000.00, 6, 29, '2021-11-22 11:21:41');
INSERT INTO bids VALUES (8, 18000.00, 6, 1, '2021-11-23 11:21:11');
INSERT INTO bids VALUES (9, 110.00, 14, 30, '2021-11-28 15:24:56');
INSERT INTO bids VALUES (10, 120.00, 13, 30, '2021-11-28 15:25:11');
INSERT INTO bids VALUES (11, 159.00, 14, 30, '2021-11-28 15:25:19');
INSERT INTO bids VALUES (12, 170.00, 13, 30, '2021-11-28 15:25:34');
INSERT INTO bids VALUES (13, 200.00, 14, 30, '2021-11-28 15:25:57');
INSERT INTO bids VALUES (14, 250.00, 13, 30, '2021-11-28 15:26:02');
INSERT INTO bids VALUES (15, 6000.00, 14, 6, '2021-11-28 15:26:30');
INSERT INTO bids VALUES (16, 7300.00, 13, 6, '2021-11-28 15:26:44');
INSERT INTO bids VALUES (17, 10000.00, 14, 6, '2021-11-28 15:29:14');
CodePudding user response:
The query uses LEFT OUTER JOIN to try to find a row b2
that matches the same item_id
as b1
, but a greater bid_price
than the value in b1
.
The way left outer joins work is that if there is no match, then the columns from the left table (b1
) are returned anyway, and the columns of b2
are NULL. You can see that in your example data if we test the query without its WHERE clause.
mysql> SELECT b1.bid_id, b1.item_id, b2.bid_id, b2.item_id
FROM bids as b1 LEFT JOIN bids AS b2 ON b1.item_id = b2.item_id
AND b1.bid_price < b2.bid_price;
-------- --------- -------- ---------
| bid_id | item_id | bid_id | item_id |
-------- --------- -------- ---------
| 2 | 1 | 1 | 1 |
| 4 | 4 | 3 | 4 |
| 1 | 1 | 8 | 1 |
| 2 | 1 | 8 | 1 |
| 9 | 30 | 10 | 30 |
| 9 | 30 | 11 | 30 |
| 10 | 30 | 11 | 30 |
| 9 | 30 | 12 | 30 |
| 10 | 30 | 12 | 30 |
| 11 | 30 | 12 | 30 |
| 9 | 30 | 13 | 30 |
| 10 | 30 | 13 | 30 |
| 11 | 30 | 13 | 30 |
| 12 | 30 | 13 | 30 |
| 9 | 30 | 14 | 30 |
| 10 | 30 | 14 | 30 |
| 11 | 30 | 14 | 30 |
| 12 | 30 | 14 | 30 |
| 13 | 30 | 14 | 30 |
| 15 | 6 | 16 | 6 |
| 15 | 6 | 17 | 6 |
| 16 | 6 | 17 | 6 |
| 3 | 4 | NULL | NULL |
| 6 | 7 | NULL | NULL |
| 7 | 29 | NULL | NULL |
| 8 | 1 | NULL | NULL |
| 14 | 30 | NULL | NULL |
| 17 | 6 | NULL | NULL |
-------- --------- -------- ---------
Most of the rows match b1
to some row b2
with the same item_id
but a greater bid_price
.
In the last six rows, the bid b1
has no other bid with a greater bid_price
, so those six are the ones that are greatest for each item_id
. Those are the ones you want in the result.
An easy way of filtering for those is with the WHERE clause you saw:
WHERE b2.item_id IS NULL
Really, you could use any column from b2
that is known to be non-NULL, because that means the only way it would be NULL is that there was no match because of the LEFT OUTER JOIN.
CodePudding user response:
Here's a quick breakdown of what's happening:
Let's start with the join itself. LEFT JOIN
indicates that it will get all results from the left side (b1) and any corresponding results from the right side (b2) if they exist. If not, all the columns in b2 will be null (keep this in mind as it plays a huge part in how this query works).
Next, it joins on item_id
, so it only gets the prices for that specific item. Then, it has another join condition, b1.bid_price < b2.bid_price
. Any comparison with NULL
will ultimately be evaluated as false. Keep in mind that both of these conditions have to be met or else all the rows in b2
are null, and also that b2
is representing each entry in the bets
table. This means that if b2
is null, it's either the only bet for that item (because of the b1.item_id = b2.item_id
constraint), or it's the highest (because of the b1.bid_price < b2.bid_price
constraint).
Lastly, the WHERE
clause. As mentioned above, the JOIN
clause is set up perfectly so that the highest bids will all have NULL
values in b2
. That makes this part easy, because now the WHERE
clause can simply get every instance where the joined b2
has NULL
values.