Home > Mobile >  Unescape twice escaped title in rss
Unescape twice escaped title in rss

Time:06-09

I got some rss with strange escaped titles, for example:

<title>S&amp;amp;P 500 : Wall Street amorce un rebond, Binance fait l&amp;apos;objet d&amp;apos;une enquête de la SEC</title>

the whole rss: https://www.dailyfx.com/francais/feeds/actualites-marches-financiers

opera browser shows such news titles correctly as follows

S&P 500 : Wall Street amorce un rebond, Binance fait l'objet d'une enquête de la SEC

How can I correctly unescape news for the case normally I receive once-escaped news, and for the case above?

CodePudding user response:

The sequence &amp; encodes a & sign. But if the content ought to be HTML for example, that may contain further HTML escape sequences.

For example if the text to display contains &, in HTML it would be encoded as &amp;. If you insert this text into an XML, the first character & also has to be escaped which results in &amp;amp;.

To get the human-readable decoded text, you have to parse the XML and decode as HTML. You may use html.UnescapeString().

For example:

const src = `<title>S&amp;amp;P 500 : Wall Street amorce un rebond, Binance fait l&amp;apos;objet d&amp;apos;une enquête de la SEC</title>`

var s string
if err := xml.Unmarshal([]byte(src), &s); err != nil {
    panic(err)
}
fmt.Println(s)

s = html.UnescapeString(s)
fmt.Println(s)

This will output (try it on the Go Playground):

S&amp;P 500 : Wall Street amorce un rebond, Binance fait l&apos;objet d&apos;une enquête de la SEC
S&P 500 : Wall Street amorce un rebond, Binance fait l'objet d'une enquête de la SEC
  • Related