Home > Enterprise >  Regex, match anything between two strings
Regex, match anything between two strings

Time:07-07

I feel like this is trivial but can't find any solution that works for me.

I have a string of this sort :

cn=doc_medical,ou=tged,ou=groupes,o=choregie,c=fr|cn=test,ou=test,ou=test,o=choregie,c=fr|cn=doc_confidentiel,ou=tged,ou=groupes,o=choregie,c=fr|cn=test,ou=test,ou=test,o=choregie,c=fr

Where I need to to find the value between cn= and ,ou=tged,ou=groupes,o=choregie,c=fr, in this case I should only match doc_medical first and doc_confidentiel then.

I have this regex : (?=cn=)(.*?)(?<=,ou=tged,ou=groupes,o=choregie,c=fr) but the problem is that it obviously matches everything after the second cn= of the global string until the next ,ou=tged,ou=groupes,o=choregie,c=fr. So my second group is wrong because it contains cn=test,ou=test,ou=test,o=choregie,c=fr|cn=doc_confidentiel,ou=tged,ou=groupes,o=choregie,c=fr instead of only doc_confidentiel.

I don't know the number of character there can be between the two strings, and I can't seem to figure out how to force the regex to match the first cn= previous to the ,ou=tged,ou=groupes,o=choregie,c=fr string instead of the first one it encounters after it.

CodePudding user response:

You can use

(?<=cn=)[^,|] (?=,ou=tged,ou=groupes,o=choregie,c=fr)

See the regex demo.

Details:

  • (?<=cn=) - a location immediately preceded with cn=
  • [^,|] - one or more chars other than | and ,
  • (?=,ou=tged,ou=groupes,o=choregie,c=fr) - a positive lookahead that requires a ,ou=tged,ou=groupes,o=choregie,c=fr string to appear immediately to the right of the current location.

See the Java demo:

import java.util.*;
import java.util.regex.*;

class Test
{
    public static void main (String[] args) throws java.lang.Exception
    {
        String regex = "(?<=cn=)[^,|] (?=,ou=tged,ou=groupes,o=choregie,c=fr)";
        String string = "cn=doc_medical,ou=tged,ou=groupes,o=choregie,c=fr|cn=test,ou=test,ou=test,o=choregie,c=fr|cn=doc_confidentiel,ou=tged,ou=groupes,o=choregie,c=fr|cn=test,ou=test,ou=test,o=choregie,c=fr";
        Pattern pattern = Pattern.compile(regex);
        Matcher matcher = pattern.matcher(string);
        while (matcher.find()) {
            System.out.println(matcher.group(0));
        }
    }
}

Output:

doc_medical
doc_confidentiel

NOTE: If there is a value other than cn that can contain more chars on the left, use a word boundary: (?<=\bcn=)[^,|] (?=,ou=tged,ou=groupes,o=choregie,c=fr). In Java, String regex = "(?<=\\bcn=)[^,|] (?=,ou=tged,ou=groupes,o=choregie,c=fr)";.

CodePudding user response:

We can use a regex replacement approach here:

String input = "cn=doc_medical,ou=tged,ou=groupes,o=choregie,c=fr|cn=test,ou=test,ou=test,o=choregie,c=fr|cn=doc_confidentiel,ou=tged,ou=groupes,o=choregie,c=fr|cn=test,ou=test,ou=test,o=choregie,c=fr";
String cn = input.replaceAll(".*\\bcn=([^,] ),ou=tged,ou=groupes,o=choregie,c=fr.*", "$1");
System.out.println(cn);  // doc_confidentiel

Note that in your current regex pattern, which uses lookarounds, you seemed to be confusing lookbehinds with lookaheads. But, the approach I gave above doesn't even need lookarounds.

CodePudding user response:

You could use a capture group, and for example not cross matching a pipe | char

\bcn=([^|]*),ou=tged,ou=groupes,o=choregie,c=fr\b

Regex demo

If it is the first value after the cn= then not matching a comma could also work:

\bcn=([^,]*),ou=tged,ou=groupes,o=choregie,c=fr\b

Explanation

  • \bcn= Match the word cn and then =
  • ([^,]*) Capture group 1, optionally match any char that you do not allow
  • ,ou=tged,ou=groupes,o=choregie,c=fr\b Match the string

Regex demo | Java demo

For example

String regex = "\\bcn=([^,]*),ou=tged,ou=groupes,o=choregie,c=fr\\b";
String string = "cn=doc_medical,ou=tged,ou=groupes,o=choregie,c=fr|cn=test,ou=test,ou=test,o=choregie,c=fr|cn=doc_confidentiel,ou=tged,ou=groupes,o=choregie,c=fr|cn=test,ou=test,ou=test,o=choregie,c=fr";

Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
Matcher matcher = pattern.matcher(string);

while (matcher.find()) {
    System.out.println(matcher.group(1));
}

Output

doc_medical
doc_confidentiel
  • Related