Home > Mobile >  Adding functionality to regular expression code for finding analytics tracking id
Adding functionality to regular expression code for finding analytics tracking id

Time:07-21

I have a regular expression function that will crawl a webpage and tell me if a Google Analytics code is on the site:

 function checkUA($domain) {
    $input = file_get_contents($domain);
    if ( $input !== false ){
        //regex to check for UA or GA4 code that returns all matches
        $regex = '/\b[A-Z][A-Z0-9]?-[A-Z0-9]{4,10}(?:\-[1-9]\d{0,3})?\b/';
        //if there is a match, return a unique array (tracking may be included more than once)
        if(preg_match_all($regex, $input,$matches)){
            return array_unique($matches[0]);
        }else{
            //if no match is found, let us know
            return 'no match found';    
        }
    }else{
        return 'Site is blocked from crawlers';
    }
 }

This function will find any tracking IDs that start with UA YT MO G DC AW:

UA-12345-1 G-J2DV45G DC-JGWWE32 AW-GER322

I'm trying to add functionality to the regular expression to find if there are Google Tag manager IDs as well:

 $regex = '/\b[A-Z][A-Z0-9]?-[A-Z0-9]{4,10}(?:\-[1-9]\d{0,3})?\b/';

So an ID that looks like this: GTM-5TDMDSZ

I can't seem to figure out how to add a check into my regular expression above that will also include checking for GTM IDs like the one above.

CodePudding user response:

You could add an alternation | to the pattern for the GTM id and enclose both patterns in a non capture group so the word boundaries at the left and right apply to both alternatives.

\b(?:[A-Z][A-Z0-9]?-[A-Z0-9]{4,10}(?:-[1-9]\d{0,3})?|GTM-[A-Z0-9]{1,7})\b

Regex demo

CodePudding user response:

As fourthbird mentioned, using a pipe to mean "or" will do the job.

I recommend tightening your existing pattern so that it only honors the specific tracker ids that you're intending to target.

To make your code easier to read, use the x pattern modifier so that typed spaces are ignored by the regex engine. You can comment inside of your regex by using # to separate pattern from comment.

Code: (Demo)

$string = 'UA-12345-1 G-J2DV45G NOPE DC-JGWWE32 AW-GER322 NAH-MATE GTM-5TDMDSZ G-WIZ';

$trackingPrefixes = ['UA', 'YT', 'MO', 'G', 'DC', 'AW'];

preg_match_all(
    '/\b
        (?:
           (?:' . implode('|', $trackingPrefixes) . ')-[A-Z\d]{4,10}(?:-[1-9]\d{0,3})?   #Tracker Ids
           |
           GTM-[A-Z\d]                                                                   #Google Tag Manager Ids
        )
    \b/x',
    $string,
    $m
);
var_export($m[0]);
  • Related