Home > Back-end >  MySQL Count after an specific value shows
MySQL Count after an specific value shows

Time:03-16

The problem is, I need to calculate average number of pages/hits after reaching the pax page (including pax hit).

The database is:

CREATE TABLE search (
  SESSION_ID INTEGER,
  HIT_NUMBER INTEGER,
  PAGE VARCHAR(24),
  MEDIUM_T VARCHAR(24) 
);

INSERT INTO search
  (SESSION_ID, HIT_NUMBER, PAGE, MEDIUM_T)
VALUES
  ('123', '1', 'home', 'direct'),
  ('123', '2', 'flights_home', 'direct'),
  ('123', '3', 'results', 'direct'),
  ('456', '1', 'pax', 'metasearch'),
  ('789', '1', 'home', 'partners'),
  ('789', '2', 'flights_home', 'partners'),
  ('789', '3', 'results', 'partners'),
  ('789', '4', 'home', 'partners'),
  ('146', '1', 'results', 'SEM'),
  ('146', '2', 'pax', 'SEM'),
  ('146', '3', 'payment', 'SEM'),
  ('146', '4', 'confirmation', 'SEM');

And my approach is:

SELECT s1.SESSION_ID, COUNT(*) as sCOUNT
FROM search s1
WHERE PAGE = 'pax'
GROUP BY s1.SESSION_ID
UNION ALL
  SELECT 'Total AVG', AVG(a.sCOUNT) FROM
  (SELECT COUNT(*) as sCOUNT
   FROM search s2
   GROUP BY s2.SESSION_ID) a

Obviously the 3r line is wrong, my code misses the part in which after 'pax' is shown starts counting and I don't have any clue for that.

Thank you in advanced :)

CodePudding user response:

My approach uses WITH CTE (common-table-expression) to pre-declare what the underlying query basis is, then querying and averaging from that.

First one premise that was not explicitly covered in your sample data. What happens IF a user bounces back and forth between multiple pages and hits the PAX page more than once. You now have multiple pax page hits. I would assume you want the FIRST instance to such pax page and that is inclusive of all page hits. This solution should help account for it.

Lets look at the inner-most from clause with final alias "pxHits".

I am grouping by session ID and grabbing the FIRST INSTANCE of a pax page hit (or null if no such pax page encountered), but ALSO grabbing the HIGHEST hit number per session. The HAVING clause will make sure that it only returns those sessions that HAD a PAX page are returned leaving all other sessions excluded from the results.

This would result with two entries passed up to the outer select which includes the 1 lastHitNumber - firstPaxHit calculation. The reason for the 1 is because you at least HIT the page once. But, in the scenario of your session 456 where the first and last hit WERE the first page, you need that since the lastHitNumber - firstPaxHit would net zero. This would be true if a person had 25 page hits and got to the pax page on page 26. Your result would still be 1 via 1 26 - 26 = 1 total page including the pax page, not the 25 prior to.

Your other qualifying session would be 146. The first pax hit was 2 but they proceeded to a highest page hit of 4. so 1 4 - 2 = 3 total pages.

So now on to the final. Since you can see the HOW things are prepared, we can now get the averages. You can't mix/auto convert different data types (session_id vs the fixed message of your 'Total Avg'. They must be the same type. So my query is converting the session_id to character to match. I happen to be getting the AVERAGE query first as a simple select from the WITH CTE alias, and THEN getting the actual session_id and counts.

with PaxSummary as
(
select
        pxHits.*,
        1   lastHitNumber - firstPaxHit HitsIncludingPax
    from
        ( select
                session_id,
                min( case when page = 'pax'
                            then hit_number
                            else null end ) firstPaxHit,
                max( hit_number ) lastHitNumber
            from
                search
            group by
                session_id
            having
                min( case when page = 'pax'
                        then hit_number
                        else null end ) > 0 ) pxHits
)


select
        'Avg Pax Pages' FinalMsg,
        avg( ps2.HitsIncludingPax ) HitsIncludingPax 
    from
        PaxSummary ps2
union all
select 
        cast( ps1.session_id as varchar) FinalMsg,
        ps1.HitsIncludingPax
    from
        PaxSummary ps1

CodePudding user response:

Finding all pax pages and the ones after it could be done with exists. Rest is straight forward:

SELECT AVG(hits)
FROM (
    SELECT session_id, COUNT(*) AS hits
    FROM search AS s1
    WHERE page = 'pax' OR EXISTS (
        SELECT *
        FROM search AS s2
        WHERE s2.session_id = s1.session_id
        AND   s2.hit_number < s1.hit_number
        AND   s2.page = 'pax'
    )
    GROUP BY session_id
) AS x

If using MySQL 8 then window functions provide a simpler solution:

WITH cte1 AS (
    SELECT session_id, MAX(CASE WHEN page = 'pax' THEN 1 END) OVER (
        PARTITION BY session_id
        ORDER BY hit_number
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) AS countme
    FROM search
), cte2 as (
    SELECT COUNT(*) AS hits
    FROM cte1
    WHERE countme IS NOT NULL
    GROUP BY session_id
)
SELECT AVG(hits)
FROM cte2

CodePudding user response:

As an alternative to the EXISTS (correlated subquery) pattern, we can write a query that gets us the hit_number of the first 'pax' hit for each session_id, and use that as an inline view.

Something along these lines:

-- count hits on or after the first 'pax' of each session_id that has a 'pax' hit

SELECT s.session_id
     , COUNT(*) AS cnt_hits_after_pax
  FROM ( -- get the first 'pax' hit for each session_id
         -- exclude session_id that do not have a 'pax' hit
         SELECT px.session_id      AS pax_session_id
              , MIN(px.hit_number) AS pax_hit_number
           FROM search px
          WHERE px.page = 'pax'
       ) p
       -- all the hits for session_id on or after the first 'pax' hit
  JOIN search s
    ON s.session_id  = p.session_id
   AND s.hit_number >= p.hit_number
 GROUP BY s.session_id
   
  

to get an average from that query, we can wrap it parens and turn it into an inline view

SELECT AVG(c.cnt_hits_after_pax) AS avg_cnt_hits_after_pax
  FROM (
         -- query above goes here
       ) c
  • Related