Home > Software design >  Unmarshal flat XML to go data structure
Unmarshal flat XML to go data structure

Time:04-14

I have a flat XML structure which I am trying to unmarshal into a go data structure. I am trying to find a way to get the list of items(item-name) in each bucket from the below XML i.e.- bucket1 = [apple,orange,grapes], bucket2= [apple,mangoes].

When I tried to unmarshal the xml into below go data structure, I was able to get the list of bucketnames and items, but I am unable to map the list of items to their respective buckets as each bucket can have many items. Is there a way to achieve this requirement from this xml by changing the go data structure? I don't have control over the structure of the XML so I can't change the it to suit my requirement. I am new to go and I'd appreciate any inputs here.

type buckets struct {
    XMLName    xml.Name `xml:"buckets"`
    BucketName []string `xml:"bucket-name"`
    ItemName   []string `xml:"item-name"`
    Weight     []string `xml:"weight"`
    Quantity   []string `xml:"quantity"`
}
        
    
    <?xml version="1.0" encoding="UTF-8"?>
    <buckets>
       <bucket-name>bucket1</bucket-name>
       <item-name>apple</item-name>
       <weight>500</weight>
       <quantity>3</quantity>
       <item-name>orange</item-name>
       <weight>500</weight>
       <quantity>2</quantity>
       <item-name>grapes</item-name>
       <weight>800</weight>
       <quantity>1</quantity>
       <bucket-name>bucket2</bucket-name>
       <item-name>apple</item-name>
       <weight>500</weight>
       <quantity>3</quantity>
       <item-name>mangoes</item-name>
       <weight>400</weight>
       <quantity>2</quantity>
    </buckets>

CodePudding user response:

I agree with mkopriva. Go's annotations are optimized for XML used for identically-structured data records. Using them for mixed content is like putting a saddle on a cow. plug: I have written code for handling mixed content that is on GitHub and I'd welcome feedback.

CodePudding user response:

What you are trying to do can be achieved by using a custom xml.UnmarshalXML and manually mapping the buckets to a Go struct.

The code described below assumes that the XML elements come in the same order as the example provided.

First of all we have the structs as described on the question:

type Buckets struct {
    XMLName xml.Name `xml:"buckets"`
    Buckets []*Bucket
}

type Bucket struct {
    BucketName string `xml:"Bucket-name"`
    Items      []*Item
}

type Item struct {
    Name     string `xml:"item-name"`
    Weight   int    `xml:"weight"`
    Quantity int    `xml:"quantity"`
}

Next we will need to implement the Unmarshaler interface by implementing the UnmarshalXML method for the Buckets struct. This method is going to get called when we call the xml.Unmarhsal and passing as destination a Buckets struct.

func (b *Buckets) UnmarshalXML(d *xml.Decoder, start xml.StartElement) error {
    b.XMLName = start.Name

    var currentBucket *Bucket
    var currentItem *Item
    for {
        t, err := d.Token()
        if t == nil {
            // append the last bucket before exiting
            b.Buckets = append(b.Buckets, currentBucket)
            break
        }
        if err != nil {
            return err
        }
        switch se := t.(type) {
        case xml.StartElement:
            switch se.Name.Local {
            case "Bucket-name":
                // check if currentBucket is nil, it is necessary for the first time that
                // is going to run. Otherwise, append the last bucket to the slice and reset it
                if currentBucket != nil {
                    b.Buckets = append(b.Buckets, currentBucket)
                }
                currentBucket = &Bucket{}

                if err := d.DecodeElement(&currentBucket.BucketName, &se); err != nil {
                    return err
                }
            case "item-name":
                currentItem = &Item{}
                if err := d.DecodeElement(&currentItem.Name, &se); err != nil {
                    return err
                }
            case "weight":
                if err := d.DecodeElement(&currentItem.Weight, &se); err != nil {
                    return err
                }
            case "quantity":
                if err := d.DecodeElement(&currentItem.Quantity, &se); err != nil {
                    return err
                }

                // since quantity comes last append the item to the bucket,  and reset it
                currentBucket.Items = append(currentBucket.Items, currentItem)
                currentItem = &Item{}
            }
        }
    }

    return nil
}

What we are essentially doing is looping over the XML elements and mapping them to our struct with our custom logic. I won't go into great details about d.Token() and xml.StartElement, you can always read the docs for more.

Now let's break down the above method:

  • When we meet an element with name Bucket-name we know that a new bucket is following, so append the already processed one (we have to check for nil since the first time there won't be any processed) to the slice and set currentBucket to a new Bucket (the one we are going to process).
  • When we meet an element with name item-name we know that a new item is following so set currentItem to a new Item.
  • When we meet an element with name quantity we know that this is the last element that belongs to the currentItem, so append it to the currentBucket.Items
  • When t finally becomes nil it indicates the end of the input stream, but since we are appending a bucket whenever we meet a new one, the last one (or if there is only a single bucket) won't get appended. So, before we break we need to append the last proccesed one.

Notes:

  • You could totally avoid the Buckets struct and create a function to handle the unmarhsaling by making use of the xml.Decoder like that:
func UnmarshalBuckets(rawXML []byte) []*Bucket {
    // or any io.Reader that points to the xml data
    d := xml.NewDecoder(bytes.NewReader(rawXML))
    ...
}

Disclaimers:

  • I know the code above feels a bit sketchy and I am sure that you can improve it. Feel free to play with it and implement the custom logic in a more readable way.
  • There should be some edge cases that I didn't cover or are not present in the example provided. You should analyze your XML and try (if possible) to cover them.
  • As already mentioned, the code is heavily dependant in the order of the XML elements.

Working example at Go Playground

  • Related