I am trying to make a regex that match and extract href link information in more than one case, for example both with double, single and no quotation mark in Swift.
A regex to match href and extract info <a href=https://www.google.com>Google</a>.
<a href="https://www.google.com">Google</a>
<a href='https://www.google.com'>Google</a>
I have found this regex, but it only works with double quotation:
<a href="([^"] )">([^<] )<\/a>
Result:
Match 1: <a href="https://www.google.com">Google</a>
Group 1: https://www.google.com
Group 2: Google
What I want is to detect all of the three ways that I provided with the sample text.
Note: I know that regex shouldn't be used for parsing HTML, but I am using it for a very small use case so it's fine.
CodePudding user response:
Answer is already in comments but posting this since the approach is bit different.
In swift 5.7
& iOS 16
u can use regexBuilder
for this.
import RegexBuilder
var link1 = "A regex to match href and extract info <a href=https://www.google.com>Google</a>."
var link2 = "<a href='\"https://www.google.com\">Google</a>"
var link3 = "<a href='https://www.google.com'>Google</a>"
let regex = Regex {
Capture {
"https://www."
ZeroOrMore(.word)
"."
ZeroOrMore(.word)
}
}
if let result1 = try? regex.firstMatch(in: link1) {
print("link: \(result1.output.1)")
}
if let result2 = try? regex.firstMatch(in: link2) {
print("link: \(result2.output.1)")
}
if let result3 = try? regex.firstMatch(in: link3) {
print("link: \(result3.output.1)")
}
This work well for the above 3 provided strings. But depend on the scenarios u might need to change the implementation.
CodePudding user response:
assuming there is no other attribute in anchor tags in the file you wish to parse, you can use the following regex : /<a href=('|"|)([^'">] )\1>([^<] )<\/a>/$2 $3/gm
.
It first captures either single quote, double quote or nothing and then \1
recalls that capturing group, watch it live here on regex101.