I got some rss with strange escaped titles, for example:
<title>S&amp;P 500 : Wall Street amorce un rebond, Binance fait l&apos;objet d&apos;une enquête de la SEC</title>
the whole rss: https://www.dailyfx.com/francais/feeds/actualites-marches-financiers
opera browser shows such news titles correctly as follows
S&P 500 : Wall Street amorce un rebond, Binance fait l'objet d'une enquête de la SEC
How can I correctly unescape news for the case normally I receive once-escaped news, and for the case above?
CodePudding user response:
The sequence &
encodes a &
sign. But if the content ought to be HTML for example, that may contain further HTML escape sequences.
For example if the text to display contains &
, in HTML it would be encoded as &
. If you insert this text into an XML, the first character &
also has to be escaped which results in &amp;
.
To get the human-readable decoded text, you have to parse the XML and decode as HTML. You may use html.UnescapeString()
.
For example:
const src = `<title>S&amp;P 500 : Wall Street amorce un rebond, Binance fait l&apos;objet d&apos;une enquête de la SEC</title>`
var s string
if err := xml.Unmarshal([]byte(src), &s); err != nil {
panic(err)
}
fmt.Println(s)
s = html.UnescapeString(s)
fmt.Println(s)
This will output (try it on the Go Playground):
S&P 500 : Wall Street amorce un rebond, Binance fait l'objet d'une enquête de la SEC
S&P 500 : Wall Street amorce un rebond, Binance fait l'objet d'une enquête de la SEC