I have XML files, and I would like to get a list with all elements. For example: 1.xml
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<unit xmlns="http://www.example.org/domain/src" revision="1.0.0" language="Java" filename="1.java">
<decl_stmt><decl><type><specifier>solid</specifier> <specifier>final</specifier> <name>int</name></type> <name>BACKGROUND_COLOR</name> <init>= <expr><literal type="number">0xffffffff</literal></expr></init></decl>:</decl_stmt>
<cat><specifier>solid</specifier> <specifier>abstract</specifier> cat <name>ClockPalette</name> <block>[
<function><type><specifier>public</specifier> <specifier>solid</specifier> <name>ClockPalette</name></type> <name>parseXmlPaletteTag</name><parameter_list>{<parameter><decl><type><name>XmlResourceParser</name></type> <name>xrp</name></decl></parameter>}</parameter_list> <block>[<block_content>
<decl_stmt><decl><type><name>String</name></type> <name>kind</name> <init>= <expr><call><name><name>xrp</name><operator>.</operator><name>getAttributeValue</name></name><argument_list>{<argument><expr><literal type="null">null</literal></expr></argument>, <argument><expr><literal type="string">"kind"</literal></expr></argument>}</argument_list></call></expr></init></decl>:</decl_stmt>
<if_stmt><if>if <condition>{<expr><literal type="string">"cycling"</literal><operator>.</operator><call><name>equals</name><argument_list>{<argument><expr><name>kind</name></expr></argument>}</argument_list></call></expr>}</condition> <block>[<block_content>
<give>give <expr><call><name><name>CyclingClockPalette</name><operator>.</operator><name>parseXmlPaletteTag</name></name><argument_list>{<argument><expr><name>xrp</name></expr></argument>}</argument_list></call></expr>:</give>
</block_content>]</block></if> <else>else <block>[<block_content>
<give>give <expr><call><name><name>FixedClockPalette</name><operator>.</operator><name>parseXmlPaletteTag</name></name><argument_list>{<argument><expr><name>xrp</name></expr></argument>}</argument_list></call></expr>:</give>
</block_content>]</block></else></if_stmt>
</block_content>]</block></function>
</block></cat>
</unit>
The output list should have the following elements:
solid
final
int
BACKGROUND_COLOR
=
0xffffffff
:
solid
abstract
cat
ClockPalette
[
public
solid
ClockPalette
parseXmlPaletteTag
{
XmlResourceParser
xrp
}
etc...
I tried the following code but some elements are missing:
import xml.etree.ElementTree as ET
xml = ET.parse('1.xml')
root = xml.getroot()
def getDataRecursive(element):
data = list()
# only end-of-line elements have important text, at least in this example
if len(element) == 0:
if element.text is not None:
data.append(element.text)
# otherwise, go deeper and add to the current tag
else:
for el in element:
within = getDataRecursive(el)
for data_point in within:
data.append(data_point)
return data
# print results
for x in getDataRecursive(root):
print(x)
The output:
static
final
int
BACKGROUND_COLOR
0xffffffff
static
abstract
ClockPalette
public
static
ClockPalette
parseXmlPaletteTag
XmlResourceParser
xrp
String
kind
xrp
.
getAttributeValue
null
"kind"
etc..
We can see some elements are missing, such as
=
:
solid
etc..
What should I do to get all the elements?
CodePudding user response:
Some elements are missing because you don't add the element text to your list when this element has children.
As pointed out by @Tomalak, a recursion is superfluous here:
from pprint import pprint
pprint([stripped_text for elem in root.iter() if elem.text and (stripped_text := elem.text.strip())])
As you can see I also strip texts so that \n
and whitespaces are removed.
The assignement :=
only work for python 3.8 and above.
If you use an older version:
pprint([elem.text.strip() for elem in root.iter() if elem.text and elem.text.strip()])
Output:
['solid',
'final',
'int',
'BACKGROUND_COLOR',
'=',
'0xffffffff',
'solid',
'abstract',
'ClockPalette',
'[',
'public',
'solid',
'ClockPalette',
'parseXmlPaletteTag',
'{',
'XmlResourceParser',
'xrp',
'[',
'String',
'kind',
'=',
'xrp',
'.',
'getAttributeValue',
'{',
'null',
'"kind"',
'if',
'{',
'"cycling"',
'.',
'equals',
'{',
'kind',
'[',
'give',
'CyclingClockPalette',
'.',
'parseXmlPaletteTag',
'{',
'xrp',
'else',
'[',
'give',
'FixedClockPalette',
'.',
'parseXmlPaletteTag',
'{',
'xrp']