Home > Software design >  Multiple for loops to parse XML to CSV not working
Multiple for loops to parse XML to CSV not working

Time:03-21

I want to write a code that can be used on different XML files (all with TEI encoding) to see if specific elements and attributes appear, how often they appear and in what context). To do this I have written the following code:

from logging import root
import xml.etree.ElementTree as ET
import csv

f = open('orestes-elements.csv', 'w', encoding="utf-8")
writer = csv.writer(f)
writer.writerow(["Note Attributes", "Note Text", "Responsibility", "Certainty Element", "Certainty Attributes", "Certainty Text"])

tree = ET.parse(r"C:\Users\noahb\OneDrive\Desktop\Humboldt\Semester 2\Daten\Hausarbeit-TEI\edition-euripides\Orestes.xml")
root = tree.getroot()


try:
    for note in root.findall('.//note'):
        noteat = note.attrib
        notetext = note.text
        print(noteat)
        print(notetext)
    #attribute search
    for responsibility in root.findall(".//*[@resp]"):
        responsibilities = str(responsibility.tag, responsibility.attrib, responsibility.text)
    for certainty in root.findall(".//*[@cert]"):
        certaintytag = certainty.tag
        certaintyat = certainty.attrib
        certaintytext = certainty.text
    writer.writerow([noteat, notetext, responsibilities, certaintytag, certaintyat, certaintytext])
finally:
    f.close()

I get the error "NameError: name 'noteat' is not defined". I can indent writer.writerrow but the information from the other for loop doesnt get added. How do I get the information from the different for loops into my CSV file? Help would be greatly appreciated? (The print() in the for loops gives me the right results and with responsibilities I tried making it all one string but that isnt necessary I am just trying out different solutions - none work until now).

This is an example of my XML file: (some of the elements and attributes will not appear in some of the files - might this be a reason form the errors?)

<?xml version="1.0" encoding="UTF-8"?>
<!--<TEI xmlns="http://www.tei-c.org/ns/1.0" xml:lang="grc">-->
<?oxygen RNGSchema="teiScholiaSchema2021beta.rng" type="xml"?>

<TEI xml:lang="grc">
 <teiHeader>
  <titleStmt>
    <title cert="high">Scholia on Euripides’ Orestes 1–500</title>
    <author><note>Donald J.</note> Mastronarde</author>
   </titleStmt>
</teiHeader>
 <text>
   <div1 type="subdivisionByPlay" xml:id="Orestes">
    <div2 type="hypotheseis" xml:id="hypOrestes">
     <head type="outer" xml:lang="en">Prefatory material (argumenta/hypotheseis) for Orestes</head>
       <p>Orestes, pursuing <note cert="low">(vengeance for)</note> the murder of his father, killed Aegisthus and
        Clytemnestra. Having dared to commit matricide he paid the penalty immediately, becoming
        mad. And after Tyndareus, the father of the murdered woman, brought an accusation, the
        Argives were about to issue a public vote about him, concerning what the man who had acted
        impiously should suffer.
        </p>    
    </div2>
   </div1>
 </text>
</TEI>

Example of what CSV should look like: CSV that should result

CodePudding user response:

The values in your writer.writerow() will not be defined if an element is missing. You could just define some default values to avoid this.

Try adding the following after the try statement:

noteat, notetext, responsibilities, certaintytag, certaintyat, certaintytext = [''] * 6

You could of course have 'NA' if preferred.

  • Related