I have an XML file which looks like below, I'm trying to get all the 'headers' that contain a value in a simple list.
xml1 = '''
<Record>
<RecordID>London01</RecordID>
<Location>London</Location>
<Date>07-09-2020</Date>
<Time>08u53m45s</Time>
<Version>2.0.1</Version>
<Version_2>v1.9</Version_2>
<Max_30e>
<I_25Hz_1s>56.40</I_25Hz_1s>
<I_25Hz_2s>7.44</I_25Hz_2s>
</Max_30e>
<Max_50e>
<I_75Hz_1s>1.56</I_75Hz_1s>
<I_75Hz_2s>0.36</I_75Hz_2s>
</Max_50e>
<Sample>
<Vehicleid>5664</Vehicleid>
<NumberY>2742</NumberY>
<NumberX>SNG</NumberX>
<NumberZ>NSR</NumberZ>
</Sample>
<Sample>
<Vehicleid>1664</Vehicleid>
<NumberY>4201</NumberY>
<NumberX>ICM</NumberX>
<NumberZ>NSR</NumberZ>
</Sample>
</Record>'''
This is what I tried:
root = ET.fromstring(xml1)
values = []
for child in root:
values.append(child.tag)
for child1 in child:
values.append(child1.tag)
print(values)
This is my current output:
['RecordID', 'Location', 'Date', 'Time', 'Version',
'Version_2', 'Max_30e', 'I_25Hz_1s', 'I_25Hz_2s', 'Max_50e',
'I_75Hz_1s', 'I_75Hz_2s', 'Sample', 'Vehicleid', 'NumberY',
'NumberX', 'NumberZ', 'Sample', 'Vehicleid', 'NumberY',
'NumberX', 'NumberZ']
This is my desired output:
['RecordID', 'Location', 'Date', 'Time', 'Version',
'Version_2', 'I_25Hz_1s', 'I_25Hz_2s', 'I_75Hz_1s', 'I_75Hz_2s',
'Vehicleid', 'NumberY', 'NumberX', 'NumberZ', 'Vehicleid',
'NumberY', 'NumberX', 'NumberZ']
CodePudding user response:
you can use a flag
for child in root:
has_child=False
for child1 in child:
has_child=True
values.append(child1.tag)
if not has_child:
values.append(child.tag)
CodePudding user response:
If you're just going two levels deep then this should be fine. Going deeper would require recursion. The reason you're seeing them attached to your values is because there is a whitespace in the header text fields.
values = []
for child in root:
if child.text.strip():
values.append(child.tag)
# Check for children
for grandChild in child:
if grandChild.text.strip():
values.append(grandChild.tag)
print (values)