Home > Mobile >  REGEXP_SUBSTR to match URL subdirectory and group by results
REGEXP_SUBSTR to match URL subdirectory and group by results

Time:06-16

Trying use REGEXP_SUBSTR to get the type/fruit extracted and group on this.

Data sample e.g

www.fruits.com/type/banana?=asdasdasdwa 
www.fruits.com/type/Apple?=asdasdasdwa
www.fruits.com/type/Apple?=asdasd21414
www.fruits.com/type/Orange?=asdasdasdwa

Using following SQL

  SELECT DATE
        COUNT(*) AS Sessions
       ,REGEXP_SUBSTR(hits.page.pagePath, '/type/?$',1, 1)
    FROM
      `159484249.ga_sessions_*`  WITH_VISIT
      LEFT JOIN UNNEST(hits) AS hits
      LEFT JOIN UNNEST(hits.customdimensions) AS customdimension
    WHERE
      _TABLE_SUFFIX BETWEEN FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 1 YEAR)) AND FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
      AND ((totals.pageviews>1 or totals.pageviews is null) or totals.totalTransactionRevenue is not null or totals.newVisits is null or totals.timeOnSite is not null)
      AND hits.page.pagePath LIKE '%/type/%' AND hits.type = 'PAGE'
     
    GROUP BY DATE,REGEXP_SUBSTR(hits.page.pagePath, '/type/?$',1, 1)
    
    ORDER BY DATE ASC, Sessions ASC;

Desired result

   Date       | Sessions | Type
   01/01/2021 | 22342    | type/apple
   02/01/2021 | 1000     | type/orange
   03/01/2021 | 2000     | type/banana

My query above returns null for Type

CodePudding user response:

Below you can see the usage of the regex string I provided in the comment to the question:

This would produce the correct substring you are looking for

with sample_data as (
  select 'www.fruits.com/type/banana?=asdasdasdwa' as url union all
  select 'www.fruits.com/type/Apple?=asdasdasdwa' union all
  select 'www.fruits.com/type/Apple?=asdasd21414' union all
  select 'www.fruits.com/type/Orange?=asdasdasdwa' union all
  select 'www.fruits.com/type/ban-a-na?=asdasdasdwa'

)

select regexp_substr(url, r'type\/\w ') original
  , regexp_substr(url, r'type\/([\w|-] )?') dashses
from sample_data

This produces the following

enter image description here

  • Related