Python - Construct DF From XML API Response-CodePudding

I'm receiving the below XML response from an API call and am looking to iterate through the "Results" and store all of the data points as a pandas dataframe.

I was successfully able to grab my data points of interest by chaining .find() methods shown below, but don't know how to loop through all of the Results block within the body given the structure of the XML response.

I am using Python 3.7 in Jupyter on Windows.

What I've Tried:

import pandas as pd
from bs4 import BeautifulSoup
import xml.etree.ElementTree as ET


soup = BeautifulSoup(soap_response.text, "xml")
# print(soup.prettify())

objectid_field = soup.find('Results').find('ObjectID').text
customerkey_field = soup.find('Results').find('CustomerKey').text
name_field = soup.find('Results').find('Name').text
issendable_field = name_field = soup.find('Results').find('IsSendable').text
sendablesubscribe_field = soup.find('Results').find('SendableSubscriberField').text

# for de in soup:
#     de_name = soup.find('Results').find('Name').text
#     print(de_name)


# test_df = pd.read_xml(soup,
#                       xpath="//Results",
#                       namespaces={""})

Sample XML Data Structure:

<?xml version="1.0" encoding="utf-8"?>
<soap:Envelope
    xmlns:soap="http://www.w3.org/2003/soap-envelope"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema"
    xmlns:xsd="http://www.w3.org/XMLSchema"
    xmlns:wsa="http://schemas.xmlsoap.org/ws/2004/08/addressing"
    xmlns:wsse="http://docs.oasis-open.org/wss/2004/01/oasis-201-wss-wssecurity-secext-1.0.xsd"
    xmlns:wsu="http://docs.oasis-open.org/wss/2004/01/oasis-201-wss-security-1.0.xsd">
    <env:Header
        xmlns:env="http://www.w3.org/2003/05/soap-envelope">
        <wsa:Action>RetrieveResponse</wsa:Action>
        <wsa:MessageID>urn:uuid:1234</wsa:MessageID>
        <wsa:RelatesTo>urn:uuid:1234</wsa:RelatesTo>
        <wsa:To>http://schemas.xmlsoap.org/ws/2004/08/dressing/role/anonymous</wsa:To>
        <wsse:Security>
            <wsu:Timestamp wsu:Id="Timestamp-1234">
                <wsu:Created>2021-11-07T13:10:54Z</wsu:Created>
                <wsu:Expires>2021-11-07T13:15:54Z</wsu:Expires>
            </wsu:Timestamp>
        </wsse:Security>
    </env:Header>
    <soap:Body>
        <RetrieveResponseMsg
            xmlns="http://partnerAPI">
            <OverallStatus>OK</OverallStatus>
            <RequestID>f9876</RequestID>
            <Results xsi:type="Data">
                <PartnerKey xsi:nil="true" />
                <ObjectID>Object1</ObjectID>
                <CustomerKey>Customer1</CustomerKey>
                <Name>Test1</Name>
                <IsSendable>true</IsSendable>
                <SendableSubscriberField>
                    <Name>_Something1</Name>
                </SendableSubscriberField>
            </Results>
            <Results xsi:type="Data">
                <PartnerKey xsi:nil="true" />
                <ObjectID>Object2</ObjectID>
                <CustomerKey>Customer2</CustomerKey>
                <Name>Name2</Name>
                <IsSendable>true</IsSendable>
                <SendableSubscriberField>
                    <Name>_Something2</Name>
                </SendableSubscriberField>
            </Results>
            <Results xsi:type="Data">
                <PartnerKey xsi:nil="true" />
                <ObjectID>Object3</ObjectID>
                <CustomerKey>AnotherKey</CustomerKey>
                <Name>Something3</Name>
                <IsSendable>false</IsSendable>
            </Results>
        </RetrieveResponseMsg>
    </soap:Body>
</soap:Envelope>'

CodePudding user response：

You're super close, you need to find all of the Results tags, then iterate over them, last grabbing the elements you want:

for el in soup.find_all('Results'):
    objectid_field = el.find('ObjectID').text
    customerkey_field = el.find('CustomerKey').text
    name_field = el.find('Name').text
    issendable_field = name_field = el.find('IsSendable').text
    sendablesubscribe_field = el.find('SendableSubscriberField').text

However, SendableSubscriberField isn't always there, so you might need to check if sendable is True first:

for el in soup.find_all('Results'):
    objectid_field = el.find('ObjectID').text
    customerkey_field = el.find('CustomerKey').text
    name_field = el.find('Name').text
    issendable_field = el.find('IsSendable').text

    # skip if not sendable
    if issendable_field == 'false':
        sendablesubscribe_field = None
        continue

    sendablesubscribe_field = el.find('SendableSubscriberField').find('Name').text

Edit: Constructing the dataframe

To build the dataframe from this, I'd collect everything into a list of dictionaries:

import pandas as pd
from bs4 import BeautifulSoup

soup = BeautifulSoup(...)

data = []

for el in soup.find_all('Results'):
    record = {}

    record['ObjectID'] = el.find('ObjectID').text
    record['CustomerKey'] = el.find('CustomerKey').text
    record['Name'] = el.find('Name').text
    record['IsSendable'] = el.find('IsSendable').text

    # skip if not sendable
    if record['IsSendable'] == 'false':
        record['SendableSubscriberField'] = None
        continue

    record['SendableSubscriberField'] = el.find('SendableSubscriberField').find('Name').text

    data.append(record)


df = pd.DataFrame(data)

CodePudding user response：

Reconsider use of pandas.read_xml by acknowledging the default namespace (http://partnerAPI). Also, since you need a lower-level value, run read_xml twice and join the results. Notice all attribute and element values are returned even if missing.

soap_df = (
    pd.read_xml(
        soap_response.text, 
        xpath = ".//rrm:RetrieveResponseMsg/rrm:Results",
        namespaces = {"rrm": "http://partnerAPI"}
    ).join(
        pd.read_xml(
            soap_response.text, 
            xpath = ".//rrm:RetrieveResponseMsg/rrm:Results/rrm:SendableSubscriberField",
            namespaces = {"rrm": "http://partnerAPI"},
            names = ["SendableSubscriberField_Name", ""]
        ),
    )
)
    
print(soap_df)
#    type  PartnerKey ObjectID CustomerKey        Name  IsSendable  SendableSubscriberField SendableSubscriberField_Name
# 0  Data         NaN  Object1   Customer1       Test1        True                      NaN                  _Something1
# 1  Data         NaN  Object2   Customer2       Name2        True                      NaN                  _Something2
# 2  Data         NaN  Object3  AnotherKey  Something3       False                      NaN                          NaN