I have a query that uses multiple left joins and trying to get a SUM of values from one of the joined columns.
SELECT
SUM( case when session.usersessionrun =1 then 1 else 0 end) new_unique_session_user_count
FROM session
LEFT JOIN appuser ON appuser.appid = '6279df3bd2d3352aed591583'
AND appuser.userid = session.userid
LEFT JOIN userdevice ON userdevice.appid = '6279df3bd2d3352aed591583'
AND userdevice.userid = appuser.userid
WHERE session.appid = '6279df3bd2d3352aed591583'
AND (session.uploadedon BETWEEN '2022-04-18 08:31:26' AND '2022-05-18 08:31:26')
But this obviously gives a redundant session.usersessionrun=1
counts since it's a joined resultset.
Here the logic was to mark the user as new
if the sessionrun for that record is 1
.
I grouped by userid
and usersessionrun
and it shows that the records are repeated.
userid. sessionrun. count
628212 1 2
627a01 1 4
So what I was trying to do was something like
SUM(CASE distinct(session.userid) AND WHEN session.usersessionrun = 1 THEN 1 ELSE 0 END) new_unique_session_user_count
i.e. for every unique user count, session.usersessionrun = 1 should only be done once.
CodePudding user response:
As you have discovered, JOIN operations can generate combinatorial explosions of data.
You need a subquery to count your sessions by userid. Then you can treat the subquery as a virtual table and JOIN it to the other tables to get the information you need in your result set.
The subquery (nothing in my answer is debugged):
SELECT COUNT(*) new_unique_session_user_count,
session.userid
FROM session
WHERE session.appid = '6279df3bd2d3352aed591583'
AND session.uploadedon BETWEEN '2022-04-18 08:31:26'
AND '2022-05-18 08:31:26'
AND session.usersessionrun = 1
AND session.appid = '6279df3bd2d3352aed591583'
GROUP BY userid
This subquery summarizes your session table and has one row per userid. The trick to avoiding JOIN-created combinatorial explosions is using subqueries that generate results with only one row per data item mentioned in a JOIN's ON-clause.
Then, you join it with the other tables like this
SELECT summary.new_unique_session_user_count
FROM (
SELECT COUNT(*) new_unique_session_user_count,
session.userid
FROM session
WHERE session.appid = '6279df3bd2d3352aed591583'
AND session.uploadedon BETWEEN '2022-04-18 08:31:26'
AND '2022-05-18 08:31:26'
AND session.usersessionrun = 1
AND session.appid = '6279df3bd2d3352aed591583'
GROUP BY userid
) summary
JOIN appuser ON appuser.appid = '6279df3bd2d3352aed591583'
AND appuser.userid = summary.userid
JOIN userdevice ON userdevice.appid = '6279df3bd2d3352aed591583'
AND userdevice.userid = appuser.userid
There may be better ways to structure this query, but it's hard to guess at them without more information about your table definitions and business rules.