<ValueList><Count>62</Count><MaxCount>62</MaxCount>
<Value ref="123456"> <DisplayName origin="UID"><100056></DisplayName><DisplayName origin="Default"><![CDATA[Xee]]></DisplayName><UniqueAlias><![CDATA[STATUS=Active]]></UniqueAlias><UniqueAlias><![CDATA[ORG=abcd]]></UniqueAlias><Hierarchy><![CDATA[xee]]></Hierarchy><AdditionalField label="Organisation"><![CDATA[MC/x]]></AdditionalField><AdditionalField label="Country"><![CDATA[Singapore]]></AdditionalField><AdditionalField label="ProjectStatus"><![CDATA[Active]]></AdditionalField></Value>
<Value ref="1234567"> <DisplayName origin="UID"><100046></DisplayName><DisplayName origin="Default"><![CDATA[Xabc]]></DisplayName><UniqueAlias><![CDATA[STATUS=Active]]></UniqueAlias><UniqueAlias><![CDATA[ORG=efgh]]></UniqueAlias><Hierarchy><![CDATA[Gee]]></Hierarchy><AdditionalField label="Organisation"><![CDATA[MC/h]]></AdditionalField><AdditionalField label="Country"><![CDATA[Malaysia]]></AdditionalField><AdditionalField label="ProjectStatus"><![CDATA[Active]]></AdditionalField></Value>
<Value ref="8984379"> <DisplayName origin="UID"><100066></DisplayName><DisplayName origin="Default"><![CDATA[WRFMDS]]></DisplayName><UniqueAlias><![CDATA[STATUS=Active]]></UniqueAlias><UniqueAlias><![CDATA[ORG=test1]]></UniqueAlias><Hierarchy><![CDATA[LEE]]></Hierarchy><AdditionalField label="Organisation"><![CDATA[MC/K]]></AdditionalField><AdditionalField label="Country"><![CDATA[USA]]></AdditionalField><AdditionalField label="ProjectStatus"><![CDATA[Active]]></AdditionalField></Value>
<Value ref="1234567"> <DisplayName origin="UID"><100446></DisplayName><DisplayName origin="Default"><![CDATA[LKGJSML]]></DisplayName><UniqueAlias><![CDATA[STATUS=Active]]></UniqueAlias><UniqueAlias><![CDATA[ORG=KLPS]]></UniqueAlias><Hierarchy><![CDATA[abeed]]></Hierarchy><AdditionalField label="Organisation"><![CDATA[MC/L]]></AdditionalField><AdditionalField label="Country"><![CDATA[uk]]></AdditionalField><AdditionalField label="ProjectStatus"><![CDATA[Active]]></AdditionalField></Value>
</valueList>
from the above xml type i want to extract the following values and create a data frame .
ref value "123456" from the value tag
value "STATUS=Active" from the "UniqueAlias" tag.
value "ORG=test1" from the "UniqueAlias" tag.
Value "Xee" from the "Hierarchy" tag ,
value "Singapore" from the "AdditionalField label="Country" tag
value "MC/x" from "AdditionalField label="Organisation" tag
create a data frame by looping through the same tags in xml file.
thanks in advance
CodePudding user response:
You would need to extract each element separately. The following example should get you started:
from bs4 import BeautifulSoup
import pandas as pd
xml = """<ValueList>
<Count>62</Count><MaxCount>62</MaxCount>
<Value ref="123456"> <DisplayName origin="UID"><100056></DisplayName><DisplayName origin="Default"><![CDATA[Xee]]></DisplayName><UniqueAlias><![CDATA[STATUS=Active]]></UniqueAlias><UniqueAlias><![CDATA[ORG=abcd]]></UniqueAlias><Hierarchy><![CDATA[xee]]></Hierarchy><AdditionalField label="Organisation"><![CDATA[MC/x]]></AdditionalField><AdditionalField label="Country"><![CDATA[Singapore]]></AdditionalField><AdditionalField label="ProjectStatus"><![CDATA[Active]]></AdditionalField></Value>
<Value ref="1234567"> <DisplayName origin="UID"><100046></DisplayName><DisplayName origin="Default"><![CDATA[Xabc]]></DisplayName><UniqueAlias><![CDATA[STATUS=Active]]></UniqueAlias><UniqueAlias><![CDATA[ORG=efgh]]></UniqueAlias><Hierarchy><![CDATA[Gee]]></Hierarchy><AdditionalField label="Organisation"><![CDATA[MC/h]]></AdditionalField><AdditionalField label="Country"><![CDATA[Malaysia]]></AdditionalField><AdditionalField label="ProjectStatus"><![CDATA[Active]]></AdditionalField></Value>
<Value ref="8984379"> <DisplayName origin="UID"><100066></DisplayName><DisplayName origin="Default"><![CDATA[WRFMDS]]></DisplayName><UniqueAlias><![CDATA[STATUS=Active]]></UniqueAlias><UniqueAlias><![CDATA[ORG=test1]]></UniqueAlias><Hierarchy><![CDATA[LEE]]></Hierarchy><AdditionalField label="Organisation"><![CDATA[MC/K]]></AdditionalField><AdditionalField label="Country"><![CDATA[USA]]></AdditionalField><AdditionalField label="ProjectStatus"><![CDATA[Active]]></AdditionalField></Value>
<Value ref="1234567"> <DisplayName origin="UID"><100446></DisplayName><DisplayName origin="Default"><![CDATA[LKGJSML]]></DisplayName><UniqueAlias><![CDATA[STATUS=Active]]></UniqueAlias><UniqueAlias><![CDATA[ORG=KLPS]]></UniqueAlias><Hierarchy><![CDATA[abeed]]></Hierarchy><AdditionalField label="Organisation"><![CDATA[MC/L]]></AdditionalField><AdditionalField label="Country"><![CDATA[uk]]></AdditionalField><AdditionalField label="ProjectStatus"><![CDATA[Active]]></AdditionalField></Value>
</valueList>"""
data = []
soup = BeautifulSoup(xml, "html.parser")
for value in soup.valuelist.find_all('value'):
additional_fields = [field.text for field in value.find_all('additionalfield')]
data.append([
value['ref'],
value.uniquealias.text,
value.hierarchy.text,
additional_fields[1],
additional_fields[0],
])
df = pd.DataFrame(data, columns=['Value', 'UniqueAlias', 'Hierarchy', 'AddField1', 'AddField2'])
print(df)
Giving you a single row dataframe:
Value UniqueAlias Hierarchy AddField1 AddField2
0 123456 STATUS=Active xee Singapore MC/x
1 1234567 STATUS=Active Gee Malaysia MC/h
2 8984379 STATUS=Active LEE USA MC/K
3 1234567 STATUS=Active abeed uk MC/L