I have two tables...one that registers users and one that checks in users. A user will always have a single entry in the register table but a user may have 0 or multiple entries in the checkin table. For a raffle selector, I wrote a query that is picking 1 entry from the register table and then 1 entry from the checkin table - each sub query picks a random entry so long as that userID does not exist in a 3rd table that stores the raffle winners. After the two entries are returned than it randomly selects one of the two returned entries as the winnner.
However, I believe there should be a more efficient way of writing this so its ONLY picking an entry once....not picking two entries and then picking one of the two.
It took me quite a while to figure out how to correctly write the below query as I am not proficient in mysql at all. The query works and seems to work efficiently, but I believe there should be a better way of writing it that also consolidates the amount of query code.
Hoping someone here can help or advise.
Table note: clubusers/clubHistory have multiple overlapping columns but the tables are not the same:
register = clubUsers
checkins = clubHistory
winners = clubRaffleWinners
SELECT * FROM (
(SELECT ch.user_ID,ch.clID FROM clubHistory AS ch
LEFT OUTER JOIN clubRaffleWinners AS cr1 ON
ch.user_ID=cr1.user_ID
AND cr1.cID=1157
AND cr1.rafID=18
AND cr1.crID=1001
AND cr1.ceID=1167
AND cr1.chDate1='2022-06-04'
WHERE
ch.cID=1157
AND ch.crID=1001
AND ch.ceID=1167
AND ch.chDate='2022-06-04'
AND cr1.user_ID IS NULL
GROUP BY ch.user_ID ORDER BY RAND() LIMIT 1
)
UNION
(SELECT cu.user_ID,cu.clID FROM clubUsers AS cu
LEFT OUTER JOIN clubRaffleWinners AS cr2 ON
cu.user_ID=cr2.user_ID
AND cr2.cID=1157
AND cr2.rafID=18
AND cr2.crID=1001
AND cr2.ceID=1167
AND cr2.chDate1='2022-06-04'
WHERE
cu.cID=1157
AND cu.crID=1001
AND cu.ceID=1167
AND cu.calDate<='2022-06-04'
AND cr2.user_ID IS NULL
GROUP BY cu.user_ID ORDER BY RAND() LIMIT 1
)
) AS foo order by RAND() LIMIT 1 ;
UPDATE:
As @JettoMartinez points out below, my current query could in fact randomly return the same user from each table so the final returned entry would just be the same user. I didn't realize this in my struggles just to get the above query to work. Thus my original OP asking for a more optimized query simply selecting a single random entry from both tables (where that user is not already in the winners table) is applicable for yet another reason.
CodePudding user response:
There are two ways I can think of (Do note that since I don't fully understand the tables, I'm not using all the conditions you used in your JOIN
statements, meaning it might need more work):
Using a exclusive subquery:
SELECT
cu.user_ID,
cu.clID,
ch.cID
FROM
clubUsers cu
LEFT JOIN clubHistory ch ON ch.user_ID = cu.user_ID
WHERE user_ID NOT IN (
SELECT
user_ID
FROM
clubRaffleWinners
WHERE
-- other conditions
)
ORDER BY RAND() LIMIT 1;
Using a LEFT "OUTER" JOIN
, as you asked for:
SELECT
cu.user_ID,
cu.clID,
ch.cID -- Or any relevant field from clubHistory, really
FROM
clubUsers cu
LEFT JOIN clubHistory ch ON ch.user_ID = cu.user_ID
LEFT JOIN clubRaffleWinners cr ON cr.user_ID = cu.user_ID
AND ... -- other conditions to ensure uniqueness
AND ... -- that could also be in the WHERE part
WHERE
cr.user_ID IS NULL -- this will filter out the INNER part of the JOIN
ORDER BY RAND() LIMIT 1;
I don't have a dataset to properly test this queries, so please take them as a concept. I also didn't queried in clubHistory
since I honestly don't see the point of doing so. Interpolating clubRaggleWinners
to clubUsers
seems enough for me.
EDIT
Since the user_ID
in clubHistory
is relevant to the raffle, I added a LEFT JOIN
to it and added a field from said table in the SELECT
statement, so that the user_id
repeats once per entry in clubHistory
plus the row of clubUsers
, meaning that every user has 1 number of entries / number of users number of entries - number of winners
chances to win.
This logic can be applied to the first query with a subquery too, and if the added field needs to be out, the query could be wrapped in a CTE or a subquery.
CodePudding user response:
From what you are describing, and I want to make sure I understand.
Every registered person is qualified 1 entry.
However, each time they have checked in, they get 1 entry for each time they checked in. So, for someone registered and has NEVER checked-in, they get 1 entry. But if someone registered, and checked in 3 times, they would get a total of JUST the 3 times they checked in, vs 4 just for being registered.
Regardless of who is POSSIBLE, you want to EXCLUDE all people who have already been a winner in the raffle.
You SHOULD be able to get results from this below. Since the columns appear to be the same filtering on the cID, crID, ceID and Date, I have the primary FROM based on the registered clubUsers.
From that, a left-join to the clubHistory will either allow that person's ID to be returned once if only registered, OR multiple times based on the times checked in such as the example.
From the given user, I am also directly left-joining to the raffle winning history on the same criteria. If its the same criteria to the club history join, and the same criteria to the raffle (with exception of rafID = 18
), appearing to indicate a specific raffle being drawn for, If the person is found, or not, the final WHERE accounts to exclude if its the single entry, or multiple entries via the IS NULL test.
The query will return all entries single or multiple, that have not already won in the order by RAND() qualifier, and apply a single LIMIT 1 to get the final winner. I dont know why you needed what appeared to be the clubhouse ID when you only really care about WHO won, without any regard to being a clubhouse history entry or not.
SELECT
cu.user_ID
FROM
clubUsers AS cu
LEFT JOIN clubHistory ch
on cu.user_ID = ch.user_ID
AND cu.cID = ch.cID
AND cu.crID = ch.crID
AND cu.ceID = ch.ceID
AND ch.chDate = '2022-06-04'
LEFT JOIN clubRaffleWinners AS crw
ON cu.user_ID = crw.user_ID
AND cu.cID = crw.cID
AND cu.crID = crw.crID
AND cu.ceID = crw.ceID
AND crw.chDate1 = '2022-06-04'
AND crw.rafID = 18
WHERE
cu.cID = 1157
AND cu.crID = 1001
AND cu.ceID = 1167
AND cu.calDate <= '2022-06-04'
AND crw.user_id IS NULL
order by
RAND()
LIMIT 1
For performance purposes, I would ensure the following indexes
table index
clubUsers ( cid, crID, ceID, calDate, user_id )
clubHistory ( user_id, cID, crID, ceID, chDate )
clubRaffleWinners ( user_id, cID, crID, ceID, chDate1, rafID )