Home > Mobile >  how to implement or after group in regex pattern
how to implement or after group in regex pattern

Time:07-14

I want to get the thread-id from my urls in one pattern. The pattern should hat just one group (on level 1). My test Strings are:

https://www.mypage.com/thread-3306-page-32.html
https://www.mypage.com/thread-3306.html
https://www.mypage.com/Thread-String-Thread-Id

So I want a Pattern, that gives me for line 1 and 2 the number 3306 and for the last line "String-Thread-Id"

My current state is .*[t|T]hread-(.*)[\-page.*|.html]. But it fails at the end after the id. How to do it well? I also solved it like .*Thread-(.*)|.*thread-(\\w ).*, but this is with two groups not applicable for my java code.

CodePudding user response:

Not knowing if this fits for all situations, but I would try this:

^.*?thread-((?:(?!-page|\.html).)*)

In Java, that could look something like

List<String> matchList = new ArrayList<String>();
Pattern regex = Pattern.compile("^.*?thread-((?:(?!-page|\\.html).)*)", Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE | Pattern.MULTILINE);
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
    matchList.add(regexMatcher.group(1));
} 

Explanation:

^                  # Match start of line
.*?                # Match any number of characters, as few as possible
thread-            # until "thread-" is matched.
(                  # Then start a capturing group (number 1) to match:
 (?:               # (start of non-capturing group)
  (?!-page|\.html) # assert that neither "page-" nor ".html" follow
 .                 # then match any character
 )*                # repeat as often as possible
)                  # end of capturingn group
  • Related