Home > Enterprise >  Updating Python lxml etree subElement attributes with values stored in CSV file
Updating Python lxml etree subElement attributes with values stored in CSV file

Time:11-04

I'm new to Python and I may have overlooked something here, but I can't find a solution. My problem share similarities with the question asked here, but because it uses Pandas I haven't been able to find a similiar solution for my case: Updating LXML attributes with values stored in CSV file

How can I fetch the subElement attributes from the row in the CSV instead of writing it as constants? My test CSV looks likes this:

Gruppering,Personnummer,Anstallningsnummer,From,Tom,Avtalstillhorighet,Pensionsalder,Anstallningstyp,Avslutsorsak
1967112912341,196711291234,1,2012-05-01,2012-05-31,PA03,65,H3,S7
1979041212341,197904121234,1,2011-03-01,2011-03-31,PA03,65,H3,S7
1979041212342,197904121234,2,2012-03-01,2012-03-31,PA03,65,H3,S7
1979041212343,197904121234,3,2013-03-01,2013-03-31,PA03,65,H3,S7
1977012912341,197701291234,1,2011-11-01,2011-11-30,PA03,65,H3,S7
1974040412341,197404041234,1,2011-03-01,2011-03-31,PA03,65,H3,S7
1988090712341,198809071234,1,2012-06-01,2012-06-30,PA03,65,H3,S7
1988090712342,198809071234,2,2013-06-01,2013-06-30,PA03,65,H3,S7

And my code creates an XML that look like this:

<Arbetstagare Personnummer="Test">
    <Anstallning Anstallningsnummer="Test">
      <Period>
        <From>2012-05-01</From>
        <Tom>2012-05-31</Tom>
        <Avtalstillhorighet>PA03</Avtalstillhorighet>
        <Pensionsalder>65</Pensionsalder>
        <Anstallningstyp>H3</Anstallningstyp>
        <Avslutsorsak>S7</Avslutsorsak>
      </Period>
    </Anstallning>
  </Arbetstagare>

Instead of Personnummer="Test" and Anstallningsnummer="Test" I would like the attribute values to be the value from the row in the CSV, column 2 and 3. So the first two rows of the XML should read:

<Arbetstagare Personnummer="196711291234">
    <Anstallning Anstallningsnummer="1">

The relevant part of my code looks like this:

from lxml import etree
from itertools import groupby
import csv

with open(r'\\ora-2.ug.test.se\utdata\Diverse filer\Öppna H3\Arbetsdokument\TestCSV.csv') as data_file:
    reader = csv.DictReader(data_file)
    f = lambda x: x['Gruppering']
    data = sorted(list(reader), key=f)

    root = etree.Element('root')
    for k, g in groupby(data, key=f):
        Personnummer = etree.SubElement(root, 'Arbetstagare', Personnummer='Test') 
        Anstallningsnummer = etree.SubElement(Personnummer, 'Anstallning', Anstallningsnummer='Test')

I guess I must use either Personnummer.attrib["Personnummer"] = ??? or Personnummer.set(attr, value) but how do I fetch the value from the CSV?

CodePudding user response:

After consulting the DictReader documentation here I would recommend something like this:

from lxml import etree
from itertools import groupby
import csv

with open(r'TestCSV.csv') as data_file:
    reader = csv.DictReader(data_file)
    root = etree.Element('root')
    for row in sorted(reader, key=lambda x:x['Gruppering']):
        Personnummer = etree.SubElement(root, 'Arbetstagare', Personnummer=row['Personnummer'])
        Anstallningsnummer = etree.SubElement(Personnummer, 'Anstallning', Anstallningsnummer=row['Anstallningsnummer'])
        Period = etree.SubElement(Anstallningsnummer, 'Period')
        etree.SubElement(Period, 'From', ).text = row['From']
        etree.SubElement(Period, 'Tom', ).text = row['Tom']
        etree.SubElement(Period, 'Avtalstillhorighet', ).text = row['Avtalstillhorighet']
        etree.SubElement(Period, 'Pensionsalder', ).text = row['Pensionsalder']
        etree.SubElement(Period, 'Anstallningstyp', ).text = row['Anstallningstyp']
        etree.SubElement(Period, 'Avslutsorsak', ).text = row['Avslutsorsak']

with open('testoutput.xml', 'wb') as f:
    f.write(etree.tostring(root))


data in your example is merely list of Python dictionaries, and the information in each one can be accessed as such.

You are probably also overthinking how to group by "Gruppering," and only need to sort by it.

Output:

<root>
    <Arbetstagare Personnummer="196711291234">
        <Anstallning Anstallningsnummer="1">
            <Period>
                <From>2012-05-01</From>
                <Tom>2012-05-31</Tom>
                <Avtalstillhorighet>PA03</Avtalstillhorighet>
                <Pensionsalder>65</Pensionsalder>
                <Anstallningstyp>H3</Anstallningstyp>
                <Avslutsorsak>S7</Avslutsorsak>
            </Period>
        </Anstallning>
    </Arbetstagare>
    <Arbetstagare Personnummer="197404041234">
        <Anstallning Anstallningsnummer="1">
            <Period>
                <From>2011-03-01</From>
                <Tom>2011-03-31</Tom>
                <Avtalstillhorighet>PA03</Avtalstillhorighet>
                <Pensionsalder>65</Pensionsalder>
                <Anstallningstyp>H3</Anstallningstyp>
                <Avslutsorsak>S7</Avslutsorsak>
            </Period>
        </Anstallning>
    </Arbetstagare>
    <Arbetstagare Personnummer="197701291234">
        <Anstallning Anstallningsnummer="1">
            <Period>
                <From>2011-11-01</From>
                <Tom>2011-11-30</Tom>
                <Avtalstillhorighet>PA03</Avtalstillhorighet>
                <Pensionsalder>65</Pensionsalder>
                <Anstallningstyp>H3</Anstallningstyp>
                <Avslutsorsak>S7</Avslutsorsak>
            </Period>
        </Anstallning>
    </Arbetstagare>
    <Arbetstagare Personnummer="197904121234">
        <Anstallning Anstallningsnummer="1">
            <Period>
                <From>2011-03-01</From>
                <Tom>2011-03-31</Tom>
                <Avtalstillhorighet>PA03</Avtalstillhorighet>
                <Pensionsalder>65</Pensionsalder>
                <Anstallningstyp>H3</Anstallningstyp>
                <Avslutsorsak>S7</Avslutsorsak>
            </Period>
        </Anstallning>
    </Arbetstagare>
    <Arbetstagare Personnummer="197904121234">
        <Anstallning Anstallningsnummer="2">
            <Period>
                <From>2012-03-01</From>
                <Tom>2012-03-31</Tom>
                <Avtalstillhorighet>PA03</Avtalstillhorighet>
                <Pensionsalder>65</Pensionsalder>
                <Anstallningstyp>H3</Anstallningstyp>
                <Avslutsorsak>S7</Avslutsorsak>
            </Period>
        </Anstallning>
    </Arbetstagare>
    <Arbetstagare Personnummer="197904121234">
        <Anstallning Anstallningsnummer="3">
            <Period>
                <From>2013-03-01</From>
                <Tom>2013-03-31</Tom>
                <Avtalstillhorighet>PA03</Avtalstillhorighet>
                <Pensionsalder>65</Pensionsalder>
                <Anstallningstyp>H3</Anstallningstyp>
                <Avslutsorsak>S7</Avslutsorsak>
            </Period>
        </Anstallning>
    </Arbetstagare>
    <Arbetstagare Personnummer="198809071234">
        <Anstallning Anstallningsnummer="1">
            <Period>
                <From>2012-06-01</From>
                <Tom>2012-06-30</Tom>
                <Avtalstillhorighet>PA03</Avtalstillhorighet>
                <Pensionsalder>65</Pensionsalder>
                <Anstallningstyp>H3</Anstallningstyp>
                <Avslutsorsak>S7</Avslutsorsak>
            </Period>
        </Anstallning>
    </Arbetstagare>
    <Arbetstagare Personnummer="198809071234">
        <Anstallning Anstallningsnummer="2">
            <Period>
                <From>2013-06-01</From>
                <Tom>2013-06-30</Tom>
                <Avtalstillhorighet>PA03</Avtalstillhorighet>
                <Pensionsalder>65</Pensionsalder>
                <Anstallningstyp>H3</Anstallningstyp>
                <Avslutsorsak>S7</Avslutsorsak>
            </Period>
        </Anstallning>
    </Arbetstagare>
</root>
  • Related