Home > other >  How to change the structure of a an XML in python
How to change the structure of a an XML in python

Time:05-14

From this string :

label_config={
    "label1": [
        "modality1",
        "modality2",
        "modality3"],
    "choice":"single",
    "required": "true",
    "name" : "sentiment"},{
    "label2": [
        "modality1",
        "modality2"],
    "name" : "price"
 }

I created this XML which is printed :

enter image description here

Anyone knows how thanks to this library : from lxml import etree can move the slashes of the yellow elements from the end to the beggining ?

Here is the code of the generation :

from lxml import etree
import sys

def topXML(dictAttrib = None):
    root : {lxml.etree._Element}
    root = etree.Element("View")
    textEl = etree.SubElement(root, "Text")
    if dictAttrib == None:
        dictAttrib = {
            "name":"text",
            "value":"$text"
        }
        for k_,v_ in dictAttrib.items():
            textEl.set(k_,v_)
            
    return root

def choiceXML(root,locChoice):

    headerEl = etree.SubElement(root, "Header")
    choisesEl = etree.SubElement(root, "Choices")
    for k_,v_ in locChoice.items():
        if (isinstance(k_,str) & isinstance(v_,list)):
            choices = v_
            headerEl.set("value",k_)
            if locChoice.get("toName") == None:
                choisesEl.set("toName","text")
            for op_ in choices:
                opEl = etree.SubElement(root, "Choice")
                opEl.set("value",op_)
        else :
            choisesEl.set(k_,v_)
    choisesEl = etree.SubElement(root, "Choices")
    
    return root

def checkConfig(locChoice):

    if locChoice.get("name") == None : 
        sys.exit("Warning : label_config needs a parameter called 'name' assigned")

def xmlConstructor(label_config):

    root = topXML()
    for ch_ in label_config:
        checkConfig(ch_)
        root = choiceXML(root,ch_)
    return root

The generated code will be used in this site https://labelstud.io/playground/ . They use some type of XML do create the code. Unfortunately, using etree it doesn't achieve the wanted product and I found out that if I made the changes described above it will work .

In the meantime I am contacting theirs team to get more info but if somoeone here has any idea on how to make it work, please come forward.

I am open to every advice, however I am searching for a well structured solution and not creating the code by appending pieces of code.

CodePudding user response:

The <Choices/> is short for <Choices></Choices> (XML spec). If you just make it a closing element, you probably don't have an opening one, and the result will be invalid xml. Any program trying to read / parse that will error out.

Notice that you have trailing slashes on all your <Choices> elements, also the non-empty ones.

If you don't want the empty <Choices/> elements, you may need to look into how you generate the XML from the dict. Since you don't provide a MCVE we can't answer that part.

CodePudding user response:

This is more a comment than an answer, but it's a bit too long for a comment. Looking at what you provide, it seems like the problem is not that your xml is too well formed (there's no such thing) or that the playground has some sort of weird xml structure. I believe the xml you generated is not what they are looking for.

If you look at your 2nd <Choices> element, it reads

<Choices toName="text" name="price"/>

Try dropping the closing / so it reads:

<Choices toName="text" name="price">

It will then be closed with the following <Choices/> and maybe it will work.

  • Related