I am trying to parse some HTML of a directory listing page using c#. That page has many file urls like "0220109_120548.046.jpg" but has also others like "0220109_120548.046-445x265.jpg". They are the same picture but one has its dimensions in the name.
I need a regex to match only the urls of those files without the dimensions.
I tried this one : href="^"*.(gif|jpg|png)"
but its not working.
the regex101 url: https://regex101.com/r/APS9NY/1
CodePudding user response:
Here is one way to do so:
href=\"[^\"]*?(?<!\d{2,4}x\d{2,4})\.(gif|jpg|png)\"
See here for the online demo.
href=\"
: Matcheshref="
[^\"]*?
: Any character that isn't"
, between zero and unlimited times, as few as possible.(?<!)
: Negative lookbehind.\d{2,4}
: Matches between 2 and 4 digits.x
: Matchesx
.
\.
: Matches.
.(gif|jpg|png)
: Matches eithergif
,jpg
orpng
.\"
: Matches"
.