I have an XML file of metadata on dissertations and I'm trying to get the author name as a single string. Names in the XML look like this:
<DISS_name>
<DISS_surname>Clark</DISS_surname>
<DISS_fname>Brian</DISS_fname>
<DISS_middle/>
<DISS_suffix/>
</DISS_name>
All names have first and last names, but only some have middle names and/or suffixes. Here is my code:
author_surname = record.find('DISS_authorship/DISS_author/DISS_name/DISS_surname').text.strip().title()
author_fname = record.find('DISS_authorship/DISS_author/DISS_name/DISS_fname').text.strip().title()
author_mname = record.find('DISS_authorship/DISS_author/DISS_name/DISS_middle')
author_suffix = record.find('DISS_authorship/DISS_author/DISS_name/DISS_suffix')
if author_mname is not None and author_suffix is not None:
author_name = author_surname ', ' author_fname author_mname.text ', ' author_suffix.text
if author_mname is not None and author_suffix is None:
author_name = author_surname ', ' author_fname author_mname.text
if author_mname is None and author_suffix is None:
author_name = author_surname ', ' author_fname
Why am I getting this output and how can I fix it?
Traceback (most recent call last):
File "C:\Users\bpclark2\pythonProject3\prqXML-to-dcCSV.py", line 185, in <module>
author_name = author_surname ', ' author_fname author_mname.text author_suffix.text
TypeError: can only concatenate str (not "NoneType") to str
Revised code:
author_surname = record.find('DISS_authorship/DISS_author/DISS_name/DISS_surname').text.strip().title()
author_fname = record.find('DISS_authorship/DISS_author/DISS_name/DISS_fname').text.strip().title()
author_mname = record.find('DISS_authorship/DISS_author/DISS_name/DISS_middle').text or ''
author_suffix = record.find('DISS_authorship/DISS_author/DISS_name/DISS_suffix').text or ''
author_name = author_surname ', ' author_fname ' ' str(author_mname.strip().title()) str(', ' author_suffix.strip().title())
row.append(author_name)
This gets the output I was looking for:
author_surname = record.find('DISS_authorship/DISS_author/DISS_name/DISS_surname').text.strip().title()
author_fname = record.find('DISS_authorship/DISS_author/DISS_name/DISS_fname').text.strip().title()
author_mname = record.find('DISS_authorship/DISS_author/DISS_name/DISS_middle').text or ''
author_suffix = record.find('DISS_authorship/DISS_author/DISS_name/DISS_suffix').text or ''
author_name = author_surname ', ' author_fname ' ' author_mname.strip().title() ', ' author_suffix.strip().title()
if author_mname != '' and author_suffix != '':
author_name = author_surname ', ' author_fname ' ' author_mname.strip().title() ', ' author_suffix.strip().title()
row.append(author_name)
if author_mname != '' and author_suffix == '':
author_name = author_surname ', ' author_fname ' ' author_mname.strip().title()
row.append(author_name)
if author_mname == '' and author_suffix != '':
author_name = author_surname ', ' author_fname ', ' author_suffix.strip().title()
row.append(author_name)
if author_mname == '' and author_suffix == '':
author_name = author_surname ', ' author_fname
row.append(author_name)
CodePudding user response:
What about changing your code to something like this:
author_mname = record.find('DISS_authorship/DISS_author/DISS_name/DISS_middle') or ''
author_suffix = record.find('DISS_authorship/DISS_author/DISS_name/DISS_suffix') or ''
Also you could add str
casts like:
... str(author_suffix.text)
And if you are on new python please use f-strings! Life is much easier with them.
CodePudding user response:
A shorter concept below
import xml.etree.ElementTree as ET
xml = '''<r><DISS_name>
<DISS_surname>Clark</DISS_surname>
<DISS_fname>Brian</DISS_fname>
<DISS_middle/>
<DISS_suffix/>
</DISS_name>
<DISS_name>
<DISS_surname>Jack</DISS_surname>
<DISS_fname>Brian</DISS_fname>
<DISS_middle>Smith</DISS_middle>
<DISS_suffix/>
</DISS_name>
</r>'''
root = ET.fromstring(xml)
for name in root.findall('.//DISS_name'):
parts = [name.find(f'DISS_{f}').text for f in ['surname','fname','middle','suffix'] if name.find(f'DISS_{f}').text is not None ]
print(", ".join(parts))
output
Clark, Brian
Jack, Brian, Smith