Trying to parse xml-request from some program I've got quite complicated architecture. It is dict of dicts of dict of dict. Some of dicts contains also lists of dicts. But due to too uncomfortable structure my dict contains lots of "garbage" words "begin_"
and "value"
throughout its depth.
For example:
<depart>
<BEGIN_>
<id Value=""/>
<code Value=""/>
<name Value=""/>
<declNameList/>
</BEGIN_>
</depart>
has transformed to
{'depart': {'BEGIN_': {'id': {'Value': ''},
'code': {'Value': ''},
'name': {'Value': ''},
'declNameList': None}}}}}
and I need:
{'depart': {'id': '',
'code': '',
'name': '',
'declNameList': None}}
May you pls help me to remove this trash using full-depth recursion?
At the moment I managed to transform h = {'status': {'BEGIN_': {'statusCode': {'Value': '0'}}}}
to {'status': {'statusCode': {'Value': '0'}}}
by using:
if 'Value' in h['status'].keys():
h['status'] = h['status']['Value']
if 'BEGIN_' in h['status'].keys():
h['status'] = h['status']['BEGIN_']
But I need to apply this kind of filter to the whole dictionary.
CodePudding user response:
As in the comments, solving the problem during parsing of the XML would be preferable if it can be done. Otherwise, we can use a non-recursive solution with queues to enqueue each inner/nested element of the document and remove the BEGIN_
and Value
respectively:
xml_dict = {
'depart': {
'BEGIN_': {
'id1': {'Value': '11'},
'code1': {'Value': '11'},
'name1': {'Value': '11'},
'declNameList1': None
}
},
'BEGIN_': {
"1": [
{
'id2': {'Value': '22'},
'code2': {'Value': '22'},
'name2': {'Value': '22'},
'declNameList2': None
},
{
'id3': {'Value': '33'},
'code3': {'Value': '33'},
'name3': {'Value': '33'},
'declNameList3': None
},
],
"2": [
{
'id4': {'Value': '44'},
'code4': {'Value': '44'},
'name4': {'Value': '44'},
'declNameList4': {
'code5': {'Value': '55'}
},
},
{
'id6': {'Value': '66'},
'code6': {'Value': '66'},
'name6': {'Value': '66'},
'declNameList6': {
'code7': {
'BEGIN_': {
'name8': {'Value': '8'}
}
}
},
},
{
'any1': {'Value': '1'}
},
[
{
"BEGIN_": {
'any2': {'Value': '2'}
},
},
{
"BEGIN_": {
'any3': {'Value': '3'}
},
}
]
]
}
}
queue = [xml_dict]
while queue:
data = queue.pop()
if isinstance(data, dict):
if begin_value := data.pop("BEGIN_", None):
data.update(begin_value)
for key, value in data.items():
if isinstance(value, dict) and value.keys() == {"Value"}:
data[key] = value["Value"]
elif isinstance(value, (dict, list)):
queue.append(value)
elif isinstance(data, list):
for item in data:
if isinstance(item, (dict, list)):
queue.append(item)
print(xml_dict)
Output
{
"depart": {
"id1": "11",
"code1": "11",
"name1": "11",
"declNameList1": None
},
"1": [
{
"id2": "22",
"code2": "22",
"name2": "22",
"declNameList2": None
},
{
"id3": "33",
"code3": "33",
"name3": "33",
"declNameList3": None
}
],
"2": [
{
"id4": "44",
"code4": "44",
"name4": "44",
"declNameList4": {
"code5": "55"
}
},
{
"id6": "66",
"code6": "66",
"name6": "66",
"declNameList6": {
"code7": {
"name8": "8"
}
}
},
{
"any1": "1"
},
[
{
"any2": "2"
},
{
"any3": "3"
}
]
]
}