I have a regular expression function that will crawl a webpage and tell me if a Google Analytics code is on the site:
function checkUA($domain) {
$input = file_get_contents($domain);
if ( $input !== false ){
//regex to check for UA or GA4 code that returns all matches
$regex = '/\b[A-Z][A-Z0-9]?-[A-Z0-9]{4,10}(?:\-[1-9]\d{0,3})?\b/';
//if there is a match, return a unique array (tracking may be included more than once)
if(preg_match_all($regex, $input,$matches)){
return array_unique($matches[0]);
}else{
//if no match is found, let us know
return 'no match found';
}
}else{
return 'Site is blocked from crawlers';
}
}
This function will find any tracking IDs that start with UA YT MO G DC AW:
UA-12345-1 G-J2DV45G DC-JGWWE32 AW-GER322
I'm trying to add functionality to the regular expression to find if there are Google Tag manager IDs as well:
$regex = '/\b[A-Z][A-Z0-9]?-[A-Z0-9]{4,10}(?:\-[1-9]\d{0,3})?\b/';
So an ID that looks like this: GTM-5TDMDSZ
I can't seem to figure out how to add a check into my regular expression above that will also include checking for GTM IDs like the one above.
CodePudding user response:
You could add an alternation |
to the pattern for the GTM id and enclose both patterns in a non capture group so the word boundaries at the left and right apply to both alternatives.
\b(?:[A-Z][A-Z0-9]?-[A-Z0-9]{4,10}(?:-[1-9]\d{0,3})?|GTM-[A-Z0-9]{1,7})\b
CodePudding user response:
As fourthbird mentioned, using a pipe to mean "or" will do the job.
I recommend tightening your existing pattern so that it only honors the specific tracker ids that you're intending to target.
To make your code easier to read, use the x
pattern modifier so that typed spaces are ignored by the regex engine. You can comment inside of your regex by using #
to separate pattern from comment.
Code: (Demo)
$string = 'UA-12345-1 G-J2DV45G NOPE DC-JGWWE32 AW-GER322 NAH-MATE GTM-5TDMDSZ G-WIZ';
$trackingPrefixes = ['UA', 'YT', 'MO', 'G', 'DC', 'AW'];
preg_match_all(
'/\b
(?:
(?:' . implode('|', $trackingPrefixes) . ')-[A-Z\d]{4,10}(?:-[1-9]\d{0,3})? #Tracker Ids
|
GTM-[A-Z\d] #Google Tag Manager Ids
)
\b/x',
$string,
$m
);
var_export($m[0]);