I'm writing a program in Python that looks at an XML file that I get from an API and should return a list of users' initials to a list for later use. My XML file looks like this with about 60 users:
<ArrayOfuser xmlns="WebsiteWhereDataComesFrom.com" xmlns:i="http://www.w3.org/2001/XMLSchema-instance">
<user>
<active>true</active>
<datelastlogin>8/21/2019 9:16:30 PM</datelastlogin>
<dept>3</dept>
<email>useremail</email>
<firstname>userfirstname</firstname>
<lastname>userlastname</lastname>
<lastupdated>2/6/2019 11:10:29 PM</lastupdated>
<lastupdatedby>lastupdateduserinitials</lastupdatedby>
<loginemail>userloginemail</loginemail>
<phone1>userphone</phone1>
<phone2/>
<rep>userinitials1</rep>
</user>
<user>
<active>true</active>
<datelastlogin>12/1/2022 3:31:25 PM</datelastlogin>
<dept>5</dept>
<email>useremail</email>
<firstname>userfirstname</firstname>
<lastname>userlastname</lastname>
<lastupdated>4/8/2020 3:02:08 PM</lastupdated>
<lastupdatedby>lastupdateduserinitials</lastupdatedby>
<loginemail>userloginemail</loginemail>
<phone1>userphone</phone1>
<phone2/>
<rep>userinitials2</rep>
</user>
...
...
...
</ArrayOfuser>
I'm trying to use an XML parser to return the text in the <rep>
tag for each user to a list. I would also love to have it sorted by date of last login, but that's not something I need and I'll just alphabetize the list if sorting by date overcomplicates this process.
The code below shows my attempt at just printing the data without saving it to a list, but the output is unexpected as shown below as well. Code I tried:
#load file
activeusers = etree.parse("activeusers.xml")
#declare namespaces
ns = {'xx': 'http://schemas.datacontract.org/2004/07/IQWebAPI.Users'}
#locate rep tag and print (saving to list once printing shows expected output)
targets = activeusers.xpath('//xx:user[xx:rep]',namespaces=ns)
for target in targets:
print(target.attrib)
Output:
{}
{}
I'm expecting the output to look like the below codeblock. Once it looks something like that I should be able to change the print statement to instead save to a list.
{userinitials1}
{userinitials2}
I think my issue comes from what's inside my print statement with printing the attribute. I tried this with variations of target.getparent()
with keys()
, items()
, and get()
as well and they all seem to show the same empty output when printed.
EDIT: I found a post from someone with a similar problem that had been solved and the solution was to use this code but I changed filenames to suit my need:
root = (etree.parse("activeusers.xml"))
values = [s.find('rep').text for s in root.findall('.//user') if s.find('rep') is not None]
print(values)
Again, the expected output was a populated list but when printed the list is empty. I think now my issue may have to do with the fact that my document contains namespaces. For my use, I may just delete them since I don't think these will end up being required so please correct me if namespaces are more important than I realize.
SECOND EDIT: I also realized the API can send me this data in a JSON format and not just XML so that file would look like the below codeblock. Any solution that can append the text in the "rep" child of each user to a list in JSON format or XML is perfect and would be greatly appreciated since once I have this list, I will not need to use the XML or JSON file for any other use.
[
{
"active": true,
"datelastlogin": "8/21/2019 9:16:30 PM",
"dept": 3,
"email": "useremail",
"firstname": "userfirstname",
"lastname": "userlastname",
"lastupdated": "2/6/2019 11:10:29 PM",
"lastupdatedby": "lastupdateduserinitials",
"loginemail": "userloginemail",
"phone1": "userphone",
"phone2": "",
"rep": "userinitials1"
},
{
"active": true,
"datelastlogin": "12/1/2022 3:31:25 PM",
"dept": 5,
"email": "useremail",
"firstname": "userfirstname",
"lastname": "userlastname",
"lastupdated": "4/8/2020 3:02:08 PM",
"lastupdatedby": "lastupdateduserinitials",
"loginemail": "userloginemail",
"phone1": "userphone",
"phone2": "",
"rep": "userinitials2"
}
]
CodePudding user response:
As this is xml with namespace, you can have like
import xml.etree.ElementTree as ET
root = ET.fromstring(xml_in_qes)
my_ns = {'root': 'WebsiteWhereDataComesFrom.com'}
myUser=[]
for eachUser in root.findall('root:user',my_ns):
rep=eachUser.find("root:rep",my_ns)
print(rep.text)
myUser.append(rep.text)
note: xml_in_qes is the XML attached in this question.
('root:user',my_ns):
search user in my_ns which has key root
i.e WebsiteWhereDataComesFrom.com
CodePudding user response:
XML data implementation:
import xml.etree.ElementTree as ET
xmlstring = '''
<ArrayOfuser>
<user>
<active>true</active>
<datelastlogin>8/21/2019 9:16:30 PM</datelastlogin>
<dept>3</dept>
<email>useremail</email>
<firstname>userfirstname</firstname>
<lastname>userlastname</lastname>
<lastupdated>2/6/2019 11:10:29 PM</lastupdated>
<lastupdatedby>lastupdateduserinitials</lastupdatedby>
<loginemail>userloginemail</loginemail>
<phone1>userphone</phone1>
<phone2/>
<rep>userinitials1</rep>
</user>
<user>
<active>true</active>
<datelastlogin>8/21/2019 9:16:30 PM</datelastlogin>
<dept>3</dept>
<email>useremail</email>
<firstname>userfirstname</firstname>
<lastname>userlastname</lastname>
<lastupdated>2/6/2019 11:10:29 PM</lastupdated>
<lastupdatedby>lastupdateduserinitials</lastupdatedby>
<loginemail>userloginemail</loginemail>
<phone1>userphone</phone1>
<phone2/>
<rep>userinitials2</rep>
</user>
<user>
<active>true</active>
<datelastlogin>8/21/2019 9:16:30 PM</datelastlogin>
<dept>3</dept>
<email>useremail</email>
<firstname>userfirstname</firstname>
<lastname>userlastname</lastname>
<lastupdated>2/6/2019 11:10:29 PM</lastupdated>
<lastupdatedby>lastupdateduserinitials</lastupdatedby>
<loginemail>userloginemail</loginemail>
<phone1>userphone</phone1>
<phone2/>
<rep>userinitials3</rep>
</user>
</ArrayOfuser>
'''
user_array = ET.fromstring(xmlstring)
replist = []
for users in user_array.findall('user'):
replist.append((users.find('rep').text))
print(replist)
Output:
['userinitials1', 'userinitials2', 'userinitials3']
JSON data implementation:
userlist = [
{
"active": "true",
"datelastlogin": "8/21/2019 9:16:30 PM",
"dept": 3,
"email": "useremail",
"firstname": "userfirstname",
"lastname": "userlastname",
"lastupdated": "2/6/2019 11:10:29 PM",
"lastupdatedby": "lastupdateduserinitials",
"loginemail": "userloginemail",
"phone1": "userphone",
"phone2": "",
"rep": "userinitials1"
},
{
"active": "true",
"datelastlogin": "12/1/2022 3:31:25 PM",
"dept": 5,
"email": "useremail",
"firstname": "userfirstname",
"lastname": "userlastname",
"lastupdated": "4/8/2020 3:02:08 PM",
"lastupdatedby": "lastupdateduserinitials",
"loginemail": "userloginemail",
"phone1": "userphone",
"phone2": "",
"rep": "userinitials2"
},
{
"active": "true",
"datelastlogin": "12/1/2022 3:31:25 PM",
"dept": 5,
"email": "useremail",
"firstname": "userfirstname",
"lastname": "userlastname",
"lastupdated": "4/8/2020 3:02:08 PM",
"lastupdatedby": "lastupdateduserinitials",
"loginemail": "userloginemail",
"phone1": "userphone",
"phone2": "",
"rep": "userinitials3"
}
]
replist = []
for user in userlist:
replist.append(user["rep"])
print(replist)
Output:
['userinitials1', 'userinitials2', 'userinitials3']