Home > Software design >  Merge two regexes
Merge two regexes

Time:11-20

I've written the following regular expression:

(?:.*[sS]trony [Ll]okalne(?: GW -)? )(. )(?: nr. )

It aims at matching what is between strony lokalne GW - and its variants (left site of string) and nr (right site of string). In my test cases, this regex is valid. I use $1 to get my capture group. Please have a look at what it captures (between ** and **).

strony lokalne GW - **Częstochowa** nr 28
[ DLO SZ ] - Strony Lokalne **Szczecin** nr 
strony lokalne GW - **Olsztyn** nr 111
[ DLO KI ] - strony lokalne GW - **Kielce** nr 270,
strony lokalne GW - **Łódź** nr 17, 
[ DLO SZ ] - Strony Lokalne **Szczecin** nr 72, 
strony lokalne GW - **Warszawa** nr 125, 
[ DLO KR ] - strony lokalne GW - **Kraków** nr 5, 
[ DLO WA ] - Strony Lokalne **Warszawa** nr 152, 
strony lokalne GW - **Zielona G?a** nr 128, 
strony lokalne GW - **Łódź** nr 63, 

I have written another regex to capture (similar group) as I wasn't able to do so in one go (i.e. using one regex). Here's my second regex:

(?:GW -? ?)(. )(?: nr. )

This time matching is really simple: I need what comes after GW and is before nr. Some examples are here below:

GW **Szczecin** nr 50\n"
GW **TORUŃ** nr 96, wydanie z dnia 23/04/2004WYDARZENIA, str. 3\n"
GW **Lublin** nr 33, wydanie z dnia 08/02/2006WYDARZENIA , str. 3\n"
GW **Wrocław** nr 45, wydanie z dnia 23/02/2004WYDARZENIA, str. 3\n"

How do I merge those two regexes?

To get my capture group I use in Java: matcher.replaceAll("$1"), where matcher is matcher object from regex pattern.

CodePudding user response:

If you want to keep your current approach you can use

.*?\b(?:[sS]trony\s [lL]okalne(?:\s GW\s -)?|GW)\s*(.*?)\s*nr.*

See the regex demo. Details:

  • .*? - any zero or more chars other than line break chars as few as possible
  • \b - a word boundary
  • (?: - start of a non-capturing group:
    • [sS]trony\s [lL]okalne - Strony/strony, 1 whitespaces, lokalne/Lokalne
    • (?:\s GW\s -)?|GW) - either one or more whitespaces, GW, 1 whitespaces, -, or a GW word
  • \s* - zero or more whitespaces
  • (.*?) - Group 1 ($1 in the replacement pattern refers to this group value): zero or more chars other than line break chars, as few as possible
  • \s*nr.* - zero or more whitespaces, nr, and zero or more chars other than line break chars, as many as possible.

Another approach is to use a regex for extraction,

\b(?:[sS]trony\s [lL]okalne(?:\s GW\s -)?|GW)\s (.*?)\s nr\b

See this regex demo. See the Java demo:

import java.util.*;
import java.util.regex.*;

class Ideone
{
    public static void main (String[] args) throws java.lang.Exception
    {
        List<String> strs = Arrays.asList(
            "strony lokalne GW - Częstochowa nr 28","[ DLO SZ ] - Strony Lokalne Szczecin nr ","strony lokalne GW - Olsztyn nr 111","[ DLO KI ] - strony lokalne GW - Kielce nr 270,","strony lokalne GW - Łódź nr 17, ","[ DLO SZ ] - Strony Lokalne Szczecin nr 72, ","strony lokalne GW - Warszawa nr 125, ","[ DLO KR ] - strony lokalne GW - Kraków nr 5, ","[ DLO WA ] - Strony Lokalne Warszawa nr 152, ","strony lokalne GW - Zielona G?a nr 128, ","strony lokalne GW - Łódź nr 63, ","","GW Szczecin nr 50","GW TORUŃ nr 96, wydanie z dnia 23/04/2004WYDARZENIA, str. 3","GW Lublin nr 33, wydanie z dnia 08/02/2006WYDARZENIA , str. 3","GW Wrocław nr 45, wydanie z dnia 23/02/2004WYDARZENIA, str. 3"
        );
        Pattern p = Pattern.compile("\\b(?:[sS]trony\\s [lL]okalne(?:\\s GW\\s -)?|GW)\\s (.*?)\\s nr\\b");
        for (String str : strs) {
            Matcher m = p.matcher(str);
            if (m.find()) {
                System.out.println("\""   str   "\" => "   m.group(1));
            }
        }
    }
}

Output:

"strony lokalne GW - Częstochowa nr 28" => Częstochowa
"[ DLO SZ ] - Strony Lokalne Szczecin nr " => Szczecin
"strony lokalne GW - Olsztyn nr 111" => Olsztyn
"[ DLO KI ] - strony lokalne GW - Kielce nr 270," => Kielce
"strony lokalne GW - Łódź nr 17, " => Łódź
"[ DLO SZ ] - Strony Lokalne Szczecin nr 72, " => Szczecin
"strony lokalne GW - Warszawa nr 125, " => Warszawa
"[ DLO KR ] - strony lokalne GW - Kraków nr 5, " => Kraków
"[ DLO WA ] - Strony Lokalne Warszawa nr 152, " => Warszawa
"strony lokalne GW - Zielona G?a nr 128, " => Zielona G?a
"strony lokalne GW - Łódź nr 63, " => Łódź
"GW Szczecin nr 50" => Szczecin
"GW TORUŃ nr 96, wydanie z dnia 23/04/2004WYDARZENIA, str. 3" => TORUŃ
"GW Lublin nr 33, wydanie z dnia 08/02/2006WYDARZENIA , str. 3" => Lublin
"GW Wrocław nr 45, wydanie z dnia 23/02/2004WYDARZENIA, str. 3" => Wrocław
  • Related