Home > Net >  TypeError when parsing XML
TypeError when parsing XML


I have an XML file of metadata on dissertations and I'm trying to get the author name as a single string. Names in the XML look like this:


All names have first and last names, but only some have middle names and/or suffixes. Here is my code:

    author_surname = record.find('DISS_authorship/DISS_author/DISS_name/DISS_surname').text.strip().title()
    author_fname = record.find('DISS_authorship/DISS_author/DISS_name/DISS_fname').text.strip().title()
    author_mname = record.find('DISS_authorship/DISS_author/DISS_name/DISS_middle')
    author_suffix = record.find('DISS_authorship/DISS_author/DISS_name/DISS_suffix')
    if author_mname is not None and author_suffix is not None:
        author_name = author_surname   ', '   author_fname   author_mname.text   ', '   author_suffix.text
    if author_mname is not None and author_suffix is None:
        author_name = author_surname   ', '   author_fname   author_mname.text
    if author_mname is None and author_suffix is None:
        author_name = author_surname   ', '   author_fname

Why am I getting this output and how can I fix it?

Traceback (most recent call last):
  File "C:\Users\bpclark2\pythonProject3\prqXML-to-dcCSV.py", line 185, in <module>
    author_name = author_surname   ', '   author_fname   author_mname.text   author_suffix.text
TypeError: can only concatenate str (not "NoneType") to str

Revised code:

    author_surname = record.find('DISS_authorship/DISS_author/DISS_name/DISS_surname').text.strip().title()
    author_fname = record.find('DISS_authorship/DISS_author/DISS_name/DISS_fname').text.strip().title()
    author_mname = record.find('DISS_authorship/DISS_author/DISS_name/DISS_middle').text or ''
    author_suffix = record.find('DISS_authorship/DISS_author/DISS_name/DISS_suffix').text or ''
    author_name = author_surname   ', '   author_fname   ' '   str(author_mname.strip().title())   str(', '   author_suffix.strip().title())

This gets the output I was looking for:

    author_surname = record.find('DISS_authorship/DISS_author/DISS_name/DISS_surname').text.strip().title()
    author_fname = record.find('DISS_authorship/DISS_author/DISS_name/DISS_fname').text.strip().title()
    author_mname = record.find('DISS_authorship/DISS_author/DISS_name/DISS_middle').text or ''
    author_suffix = record.find('DISS_authorship/DISS_author/DISS_name/DISS_suffix').text or ''
    author_name = author_surname   ', '   author_fname   ' '   author_mname.strip().title()   ', '   author_suffix.strip().title()
    if author_mname != '' and author_suffix != '':
        author_name = author_surname   ', '   author_fname   ' '   author_mname.strip().title()   ', '   author_suffix.strip().title()
    if author_mname != '' and author_suffix == '':
        author_name = author_surname   ', '   author_fname   ' '   author_mname.strip().title()
    if author_mname == '' and author_suffix != '':
        author_name = author_surname   ', '   author_fname   ', '   author_suffix.strip().title()
    if author_mname == '' and author_suffix == '':
        author_name = author_surname   ', '   author_fname

CodePudding user response:

What about changing your code to something like this:

author_mname = record.find('DISS_authorship/DISS_author/DISS_name/DISS_middle') or ''
author_suffix = record.find('DISS_authorship/DISS_author/DISS_name/DISS_suffix') or ''

Also you could add str casts like:

...   str(author_suffix.text)

And if you are on new python please use f-strings! Life is much easier with them.

CodePudding user response:

A shorter concept below

import xml.etree.ElementTree as ET

xml = '''<r><DISS_name>
root = ET.fromstring(xml)
for name in root.findall('.//DISS_name'):
  parts = [name.find(f'DISS_{f}').text for f in ['surname','fname','middle','suffix'] if name.find(f'DISS_{f}').text is not None ]
  print(", ".join(parts))


Clark, Brian
Jack, Brian, Smith
  • Related