Let's say I have a structure, that can reference elements multiple times:
<?xml version="1.0" encoding="UTF-8"?>
<book category="cooking">
<title lang="en">Everyday Italian</title>
<author>Giada De Laurentiis</author>
<year>2005</year>
<price>30.00</price>
Blah Blah Blah Bleh Blah of <year/> written by <author/>
</book>
How can I parse this XML (or better to say, how can I describe the structure), so that I can have these internal references to it?
type Book struct{
t string `xml:"book>title"`
p string `xml:"book>price"`
y string `xml:"book>year"`
a string `xml:"book>author"`
blah string ???????
}
The naïve approach (https://go.dev/play/p/JVM98pCcI0D), just to describe blah
as cdata
is obviously wrong, because the references <year/>
and <author/>
are getting lost.
What is the right way to define blah
here, so that the internal structure of it, is still available after parsing?
CodePudding user response:
A solution based on icza's comment:
func (b *Book) UnmarshalXML(d *xml.Decoder, start xml.StartElement) error {
for {
t, err := d.Token()
if err != nil {
if err != io.EOF {
return err
}
return nil
}
switch t := t.(type) {
case xml.StartElement:
var f interface{} // field
var r string // replace
switch t.Name.Local {
case "title":
f = &b.Title
case "author":
if len(b.Author) > 0 { // if "author" was already decoded then assume this is the element in the "blah chardata"
r = b.Author // if you want <author/> to appear in Text then do `r = "<author/>"` instead
} else {
f = &b.Author
}
case "year":
if len(b.Year) > 0 { // same logic as for author above
r = b.Year
} else {
f = &b.Year
}
case "price":
f = &b.Price
}
if f != nil {
if err := d.DecodeElement(f, &t); err != nil {
return err
}
}
if len(r) > 0 {
b.Text = " " r " " // add empty space for padding the replacement string
}
case xml.CharData:
s := strings.TrimSpace(string(t))
if len(s) > 0 {
b.Text = s
}
}
}
return nil
}