Home > Software engineering >  How to exclusively detect subdomains of a URL with a regular expression
How to exclusively detect subdomains of a URL with a regular expression

Time:07-23

I am making a chrome extension that is given a list of domains that needs to be compared against the active URL of a tab. For example if the list of domains has "google" then the extension should detect "docs.google.com" as part of the domain list. I have gotten this part to work. The issue is when the domain list contains a subdomain. For example: if "docs.google" is on the list then if the user is on "google.com" the extension should not recognize this as a URL on the domain list.

I am attempting this by constructing a regular expression. for each domain and subdomain. As I said, when you are given a domain (as opposed to a subdomain) it works properly although I have tested this with subdomains and it does not seem to work. I assume the issue is with how I constructed the RegEx. Anything that stands out? thank you in advance!

let onDomainList = false;
for(let i = 0; i < domainListLength-1; i  ){
                if(!domainList[i].includes(".")){ //if this domain is not a subdomain
                    let strPattern = "^https://www\\."   list.domainList[i].replace(/[-\/\\^$* ?.()|[\]{}]/g, '\\$&')   "|https://[a-z_] \\."   list.domainList[i].replace(/[-\/\\^$* ?.()|[\]{}]/g, '\\$&');
                    let domainRegEx = new RegExp(strPattern,'i');
                    if(domainRegEx.test(activeTab.url)){
                        onDomainList = true;
                        execute_script(activeTab);
                    }
                } else{ //if this domain is a subdomain
                    let strPattern = "^https://www\\."   list.domainList[i].replace(/[-\/\\^$* ?.()|[\]{}]/g, '\\$&');
                    let domainRegEx = new RegExp(strPattern,'i');
                    if(domainRegEx.test(activeTab.url)){
                        onDomainList = true;
                        execute_script(activeTab);
                    }
                }
            }

EDIT: Changed RegEx to what Wiktor Stribizew suggested, although still the issue of not detecting subdomains.

CodePudding user response:

Here is a fixed snippet:

let onDomainList = false;
for (let i = 0; i < domainListLength - 1; i  ) {
  if (!domainList[i].includes(".")) { //if this domain is not a subdomain
    let strPattern =
      let strPattern = "^https://www\\."   domainList[i].replace(/[-\/\\^$* ?.()|[\]{}]/g, '\\$&')   "|https://[a-z_] \\."   domainList[i].replace(/[-\/\\^$* ?.()|[\]{}]/g, '\\$&');
    let domainRegEx = new RegExp(strPattern, 'i');
    if (domainRegEx.test(activeTab.url)) {
      onDomainList = true;
      execute_script(activeTab);
    }
  } else { //if this domain is a subdomain
    let strPattern = "^https://(?:[^\\s/]*\\.)?"   list.domainList[i].replace(/[-\/\\^$* ?.()|[\]{}]/g, '\\$&');
    let domainRegEx = new RegExp(strPattern, 'i');
    if (domainRegEx.test(activeTab.url)) {
      onDomainList = true;
      execute_script(activeTab);
    }
  }
}

Notes:

  • Since you are using a RegExp constructor notation, and define the regex with a regular string literal, you need to properly introduce backslashes used to escape special chars. Here, there is no need to escape / and the . needs two backslashes, the "\\." string literal is actually a \. text
  • The variable texts need escaping to be used properly in the code, hence domainList[i].replace(/[-\/\\^$* ?.()|[\]{}]/g, '\\$&')
  • The / before ^ renders the regex useless since there can be no / before the start of string, and thus /^ is a regex that never matches any string. / as regex delimiters should not be used in RegExp constructor notation
  • A subdomain regex does not actually match anything but https://www. the domain from your list. To allow anything before the domain, you can replace www\. with (?:[^\s/]*\.)? that matches an optional sequence ((?:...)? is an optional non-capturing group) of zero or more chars other than whitespace and / (with the [^\/s]* negated character class) and then a dot.
  • Related