Home > database >  Regex to return all subdomains from a given domain
Regex to return all subdomains from a given domain

Time:08-09

Given a domain string like aaaa.bbbb.cccc.dddd I am trying to iterate over all of its subdomains i.e.

aaaa.bbbb.cccc.dddd
bbbb.cccc.dddd
cccc.dddd
dddd

I thought this regex ((?:[a-zA-Z0-9] \.)*)([a-zA-Z0-9] )$ should do the trick (please ignore the fact, that I am only matching these characters [a-zA-Z0-9]), however it only matches the full string.

How can I modify it to make it work?

Edit 1: The following code

var pattern = Pattern.compile("((?:[a-zA-Z0-9] \\.)*)([a-zA-Z0-9] )$"); //fixed regex here
var matcher = pattern.matcher("aaaa.bbbb.cccc.dddd");
matcher.results()
    .forEach(matchResult -> System.out.println(matchResult.group()));

should print (in any order)

aaaa.bbbb.cccc.dddd
bbbb.cccc.dddd
cccc.dddd
dddd

CodePudding user response:

The regex you're looking for is

(?=(?:^|\.)([\.\w] )*)

This pattern is based on lookahead. It can cross-match substrings that have already been matched in previous iterations.

Java Example

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Main {
    public static void main(String[] args) {
        final String regex = "(?=(?:^|\\.)([\\.\\w] )*)";
        final String domain = "aaaa.bbbb.cccc.dddd";
        
        final Pattern pattern = Pattern.compile(regex);
        final Matcher matcher = pattern.matcher(domain);
        
        while (matcher.find()) {
            for (int i = 1; i <= matcher.groupCount(); i  ) {
                System.out.println(matcher.group(i));
            }
        }
    }
}

CodePudding user response:

This should group your blocks right: (([a-zA-Z].[a-zA-Z]*))

The output is following:

aaaa aaaa aaaa bbbb bbbb bbbb cccc cccc cccc dddd dddd dddd

Now you can write your code and terminate the entrys you did not need! This would be my solution. Even if the code is not perfect. Only my fast solution. Hope this helps a bit.

  • Related