Home > Net >  Collect Parts of a String representing <img> Tags into a List in Java 8
Collect Parts of a String representing <img> Tags into a List in Java 8

Time:11-16

There is a use case in which I have a long String which can contain many <img> tags. I need to collect the entire image tag from start(<img src=") to close(">) in a List.

I wrote a regex("<img.*?\">"gm) for seleting these but don't know how to collect them all in a List.

eg:

final String regex = "<img.*?\\\">";
final String string = "Hello World <img src=\"https://dummyimage.com/300.png/09f/777\"> \nMy Name <img src=\"https://dummyimage.com/300.png/09f/ff2\"> Random Text\nHello\nHello Random <img src=\"https://dummyimage.com/300.png/09f/888\"> \nMy Name <img src=\"https://dummyimage.com/300.png/09f/2ff\">adaad\n";
final String replace = "";

final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(string);

final String result = matcher.replaceAll(replace); // Here, how can I collect all the image tags in a list

CodePudding user response:

You can simply do this:

final List<String> result = new ArrayList<>();
while (matcher.find()) {
     result.add(matcher.group());
}

and get rid of your final String replace = "";

CodePudding user response:

Java 8 - Pattern.splitAsStream()

We can split the given string using so-called Lookaheads and Lookbehinds (for more information, check the reference provided below):

  • (?<=.)(?=<) - matches a position between a character of any kind and an opening angle bracket < (i.e. it captures an empty substring between any character and beginning of a tag).

  • (?<=>)(?=.) - matches a position between a closing angle bracket > and any kind of character.

public static final Pattern ANGLE_BRACKETS =
    Pattern.compile("(?<=.)(?=<)|(?<=>)(?=.)");

By using this Pattern, we generate a stream of substring stilted on an empty string on the border of opening and closing angle brackets. And then filter the strings that represent a valid image-tag.

final String string = "Hello World <img src=\"https://dummyimage.com/300.png/09f/777\"> \nMy Name <img src=\"https://dummyimage.com/300.png/09f/ff2\"> Random Text\nHello\nHello Random <img src=\"https://dummyimage.com/300.png/09f/888\"> \nMy Name <img src=\"https://dummyimage.com/300.png/09f/2ff\">adaad\n";

List<String> imageTags = ANGLE_BRACKETS.splitAsStream(string)
    .filter(str -> str.strip().matches("<img[^<] >")) // verifying that a string is a valid image tag
    .toList();
        
imageTags.forEach(System.out::println);

A link to Online Demo

Java 9 - Matcher.results()

In the regular expression, you need to care about the opening angle bracket < (not quotation mark) to ensure that a captured substring contains only one tag:

public static final Pattern IMG_TAG = Pattern.compile("img[^<] >");

Using Java 9 method Matcher.results() we can create a stream of MatchResult objects, which contain information about captured sequences in the given string. And to obtain the matching substring, we can use MatchResult.group().

final String string = "Hello World <img src=\"https://dummyimage.com/300.png/09f/777\"> \nMy Name <img src=\"https://dummyimage.com/300.png/09f/ff2\"> Random Text\nHello\nHello Random <img src=\"https://dummyimage.com/300.png/09f/888\"> \nMy Name <img src=\"https://dummyimage.com/300.png/09f/2ff\">adaad\n";
    
List<String> imageTags = IMG_TAG.matcher(string).results() // Stream<MatchResult>
    .map(MatchResult::group)                               // Stream<String>
    .toList();
        
imageTags.forEach(System.out::println);

Output:

<img src="https://dummyimage.com/300.png/09f/777">
<img src="https://dummyimage.com/300.png/09f/ff2">
<img src="https://dummyimage.com/300.png/09f/888">
<img src="https://dummyimage.com/300.png/09f/2ff">

A link to Online Demo

  • Related