Home > Mobile >  Calculate number of times a user visited a page in SQL
Calculate number of times a user visited a page in SQL

Time:02-24

I have table as shown below. It has 3 columns user, page and timestamp.

 ------ ------------------ ----------- 
| user |       page       | timestamp |
 ------ ------------------ ----------- 
|    1 | homepage?c=1234  |      1234 |
|    1 | homepage?c=1234  |      1245 |
|    1 | homepage?c=1234  |      1260 |
|    1 | homepage?c=1234  |      1280 |
|    1 | Signup?=1233     |      1293 |
|    1 | Signup?=121asd   |      1303 |
|    1 | Signup?=212      |      1317 |
|    1 | Signup?123213    |      1337 |
|    1 | homepage?c=1234  |      1357 |
|    1 | Hotels           |      1370 |
|    1 | Hotels           |      1384 |
|    1 | Hotels           |      1398 |
|    1 | Signup?=121asd   |      1413 |
|    1 | Signup?=121as123 |      1433 |
|    1 | homepage?c=1234  |      1447 |
|    1 | homepage?c=1234  |      1463 |
|    1 | homepage?c=1234  |      1482 |
|    1 | homepage?c=1234  |      1496 |
 ------ ------------------ -----------  

In the above table I want to calculate number of times a user visits a particular page. Each record in the above table is a hit, so just grouping by column page wont help.

I essentially want to count continuous records for a particular page only once. I only want to increment the counter for a particular page when user visits some other page in between.

Something as shown below:

enter image description here

The user visits homepage 3 times (yellow color), Signup page 2 times (blue color) and Hotels page 1 time (orange in color).

The output expected is as below:

 ------- ---------- -------- -------- 
|       | Homepage | Signup | Hotels |
 ------- ---------- -------- -------- 
| users |        3 |      2 |      1 |
 ------- ---------- -------- -------- 

CodePudding user response:

So what you want to do is first get the page name out of the url, so homepage?c=1234 becomes homepage. Then for each visit you can use LAG to see if the page changed between and use a window SUM function to sum up all the changes up until this point. Each of the page visit groups will then get a unique grouping number, then it's just a matter of counting the distinct grouping numbers for each of them.

I've split this into individual CTEs for clarity and understanding:

WITH pagesplit AS
(
    SELECT user
         , (split(page, '?')[safe_ordinal(1)]) AS page
         , ts 
      FROM so.visits
),
page_with_prev AS
(
    SELECT user
         , page
         , COALESCE(LAG(_page) OVER (PARTITION BY user ORDER BY _ts), '?') AS prev_page
         , ts
      FROM pagesplit
),
count_consecutive AS
(
    SELECT user
         , page
         , SUM(CASE WHEN page != prev_page THEN 1 ELSE NULL END) OVER (PARTITION BY user ORDER BY ts) AS grouping_no
      FROM page_with_prev
)
SELECT user
     , page
     , COUNT(DISTINCT grouping_no) AS visit_count
  FROM count_streaks
 GROUP BY user, page

Pivoting the result into columns may not make much sense unless you know exactly how many you will have and that no new ones will ever be added. You can do that if you want of course.

CodePudding user response:

Consider also below options

Flatten output

select user, page, count(distinct visit_number) visits
from (
  select *, countif(new_visit) over(partition by user order by timestamp) visit_number
  from (
    select *, page != lag(page, 1, '') over(partition by user order by timestamp) new_visit
    from (
      select user, split(page, '?')[offset(0)] page, timestamp
      from your_table
    )
  )
)
group by user, page         

with result

enter image description here

Or with Pivot

select * from (
  select user, page, countif(new_visit) over(partition by user order by timestamp) visit_number
  from (
    select *, page != lag(page, 1, '') over(partition by user order by timestamp) new_visit
    from (
      select user, split(page, '?')[offset(0)] page, timestamp
      from your_table
    )
  )
)
pivot (count(distinct visit_number) for lower(page) in ('homepage', 'signup', 'hotels'))

with output

enter image description here

  • Related