I am trying to create a program that creates 2 csv files from an xml. However, my data isn't writing into the csv files. I managed to only get the titles.
- How many books were published each year
- How many times is each subject heading mentioned
Here is a sample of my xml that I am using
<records>
<rec resultID="1">
<header shortDbName="cat01806a" longDbName="Simmons Library Catalog" uiTerm="sim.b2083905">
<controlInfo>
<bkinfo>
<btl>Android programming [electronic resource] : pushing the limits / Erik Hellman.</btl>
<isbn type="print">9781118717301</isbn>
<isbn type="print">9781118717356</isbn>
</bkinfo>
<jinfo />
<pubinfo>
<dt year="2014" month="01" day="01"></dt>
</pubinfo>
<artinfo>
<tig>
<atl>Android programming. [electronic resource] : pushing the limits.</atl>
</tig>
<aug>
<au>Hellman, Erik</au>
</aug>
<sug>
<subj type="unclass">Android (Electronic resource)</subj>
<subj type="unclass">Application software -- Development</subj>
<subj type="unclass">Smartphones -- Programming</subj>
<subj type="unclass">Tablet computers -- Programming</subj>
</sug>
<pubtype>eBook</pubtype>
<pubtype>Book</pubtype>
<doctype>Bibliographies</doctype>
<formats />
</artinfo>
<language>English</language>
</controlInfo>
<displayInfo>
<pLink>
<url>http://ezproxy.simmons.edu:2048/login?url=https://search.ebscohost.com/login.aspx?direct=true&db=cat01806a&AN=sim.b2083905&site=eds-live&scope=site</url>
</pLink>
</displayInfo>
</header>
</rec>
<iframe name="sif1" sandbox="allow-forms allow-modals allow-scripts" frameborder="0"></iframe>
Here is the code I have so far:
#import libraries
import csv, xml
import xml.etree.ElementTree as ET
#read open
base = ET.parse('simmons_program_books.xml')
detail = base.getroot()
#frequeny count for dictionary
def count(dictionary, key):
if key in dictionary:
dictionary[key] = 1
else:
dictionary[key] = 1
#empty dictionary variables
year_count = {}
subhead_count = {}
for year in detail.iter('dt year'):
#variable
count(year_count, year.text)
for subhead in detail.iter('subj type'):
count(subhead_count, subhead.text)
#to a csv (year)
year = open("year.csv", mode ='w', newline = '', encoding="utf-8")
write = csv.writer(year)
write.writerow(['year', '# books'])
for x, z in year_count.items():
write.writerow([x, z])
#close
year.close()
#to a csv (subhead)
subhead = open("subhead.csv", mode = 'w', newline = '', encoding ="utf-8")
write = csv.writer(subhead)
write. writerow(['subheading', '# amt mentioned'])
for x, z in subhead_count.items():
write.writerow([x, z])
#close
subhead.close()
I'm not sure what's wrong.
CodePudding user response:
Your
iter()
method is looking for non-existent children'dt year'
&'subj type'
. They should be looking for'year'
and'subj'
instead.To populate the year text in the dictionary, use
year.get('year')
instead ofyear.text
.
CodePudding user response:
first, detail.iter('dt year')
won't work. iter over dt
s and then check year.
second, your count function has to return something
#frequeny count for dictionary
def count(dictionary, key):
if key in dictionary:
dictionary[key] = 1
else:
dictionary[key] = 1
return dictionary
#empty dictionary variables
year_count = {}
subhead_count = {}
for dt in detail.iter('dt'):
#variable
year_count=count(year_count, dt.attrib['year'])
print('year', dt, dt.attrib['year'])
for subhead in detail.iter('subj'):
subhead_count=count(subhead_count, subhead.text)