I have markdown content with multiple images where each image is:
![Image Description](/image-path.png)
I am selecting all images in the markdown content using Regex:
var matches = new Regex(@"!\[.*?\]\((.*?)\)").Matches(content);
I am getting 2 groups:
Groups[0] = ![Image Description](/image-path.png); > (Everything)
Groups[1] = /image-path.png > (Image Path)
Wouldn't be possible to get instead?
Groups[0] = Image Description. > (Image Description)
Groups[1] = /image-path.png > (Image Path)
CodePudding user response:
Currently the group 1 value is part of the matched string.
You could get the match for Image Description
and only /image-path.png)
in group 1 using a lookbehind and a lookahead with a capture group:
(?<=!\[)[^][]*(?=]\(([^()]*)\))
The pattern in parts matches:
(?<=![)
Assert![
to the left[^][]*]
Match any char except[
and]
(?=
Positive lookahead to assert to the right]\(([^()]*)\)
Match]
and capture in group 1 what is between(...)
)
Close lookahead
CodePudding user response:
You can capture the relevant sections of your content text by using a capture group.
Compare your regex and mine, where I made a very small change by adding parentheses to capture the Image Description part of your content:
!\[.*?\]\((.*?)\) !\[(.*?)\]\((.*?)\)
Capture groups are automatically numbered starting at index 1 so these groups are available as
matches[0].Groups[1]
: which containsImage Description
andmatches[0].Groups[2]
: which contains/image-path.png
matches[0].Group[0]
is still the whole match.
using System;
using System.Text.RegularExpressions;
public class Program
{
public static void Main()
{
string content = @"![Image Description](/image-path.png)";
var matches = new Regex(@"!\[(.*?)\]\((.*?)\)").Matches(content);
Console.WriteLine(matches[0].Groups[1]);
Console.WriteLine(matches[0].Groups[2]);
}
}
This outputs:
Image Description /image-path.png
Here is a Runnable .NET Fiddle of the above.