Home > other >  Beginner SQL: JOIN clause skewing results of query
Beginner SQL: JOIN clause skewing results of query

Time:03-19

thank you all for taking the time to read and help if you can! I have a query below that is getting large and messy, I was hoping someone could point me in the right direction as I am still a beginner.

SELECT
  DATE(s.created_time_stamp) AS Date,
  s.security_profile_id AS Name,
  COUNT(*) AS logins,
  CASE
    WHEN COUNT(s.security_profile_id) <= 1
    THEN '1'
    WHEN COUNT(s.security_profile_id) BETWEEN 2 AND 3
    THEN '2-3'
    ELSE '4 '
END AS sessions_summary
FROM session AS s
INNER JOIN member AS m
ON s.security_profile_id = m.security_profile_id
  JOIN member_entitlement AS me ON m.id = me.member_id
     JOIN member_package AS mp ON me.id = mp.member_entitlement_id
     **JOIN member_channels AS mc ON mc.member_id = m.id**
where member_status = 'ACTIVE'
  and metrix_exempt = 0
  and m.created_time_stamp >= STR_TO_DATE('03/08/2022', '%m/%d/%Y')
  and display_name not like 'john%doe%'
  and email not like '%@aeturnum.com'
  and email not like '%@trendertag.com'
  and email not like '%@sargentlabs.com'
  and member_email_status = 'ACTIVE'
  and mp.package_id = 'ca972458-bc43-4822-a311-2d18bad2be96'
  and display_name IS NOT NULL
  and s.security_profile_id IS NOT NULL 
  **and mc.id IS NOT NULL** 
GROUP BY
  DATE(created_time_stamp),
  Name
ORDER BY
  DATE(created_time_stamp),
  Name

The two parts of the query with asterisks are the two most recently added clauses and they skew the data. Without these, the query runs fine. I am trying get a session summary which works fine, but I only want the sessions of people who have a 'channel' created. Maybe mc.id IS NOT NULL is not the way to do this. I will share my query that shows me how many people have created channels. Essentially, I am trying to combine these two queries in the cleanest way possible. Any advice is greatly appreciated!

-- Users that have Topic Channels and Finished Set Up FOR TRIAL DASH**
select count(distinct(m.id)) AS created_topic_channel
    from member m right join member_channels mc on mc.member_id = m.id
     left join channels c on c.id = mc.channels_id
     JOIN member_entitlement AS me ON m.id = me.member_id
     JOIN member_package AS mp ON me.id = mp.member_entitlement_id
     where title not like '@ Mentions'
    and  member_status = 'ACTIVE'
  and metrix_exempt = 0
  and m.created_time_stamp >= STR_TO_DATE('03/08/2022', '%m/%d/%Y')
  and display_name not like 'john%doe%'
  and email not like '%@aeturnum.com'
  and email not like '%@trendertag.com'
  and email not like '%@sargentlabs.com'
  and member_email_status = 'ACTIVE'
  and display_name IS NOT NULL
  and mp.package_id = 'ca972458-bc43-4822-a311-2d18bad2be96';

The metric I am trying to retrieve from the DB is how many users have created a channel and logged in at least twice. Thank you again and have a wonderful day!!

CodePudding user response:

If id is the primary key of member_channels then it does not make sense to check if it is null.

If all you want is to check whether a member has a 'channel' created, then instead of the additional join to member_channels, which may cause the query to return more rows than expected, you could use EXISTS in the WHERE clause:

where member_status = 'ACTIVE'
  and .......................
  and EXISTS (SELECT 1 FROM member_channels AS mc WHERE mc.member_id = m.id)

CodePudding user response:

I would guess your tables aren't at the same level of granularity. A member may have many sessions, and 0-many channels.

eg if member 123 has five sessions and creates three channels => 15 rows of data in this join.

To adjust for this, it's best practice to join on the same level of granularity. You could roll up sessions to the member level, channels to the member level, and then join both against members.

  • Related