I'm trying to parse an svg file string using minidom and extract all the tags. That works without a problem. What I want to do now is to also obtain a list of all the attributes a path tag contains. I could easily make my own parser using regex, but I'd like to use something more reliable than my own spaghetti code. I tried doing path._get_attributes()
but that returns a KeyError. Here's my code thus far.
from xml.dom import minidom
svg_string = '''<?xml version='1.0' encoding='iso-8859-1'?>
<svg version='1.1' baseProfile='full'
xmlns='http://www.w3.org/2000/svg'
xmlns:rdkit='http://www.rdkit.org/xml'
xmlns:xlink='http://www.w3.org/1999/xlink'
xml:space='preserve'
width='300px' height='300px' viewBox='0 0 300 300'>
<!-- END OF HEADER -->
<rect style='opacity:1.0;fill:#FFFFFF;stroke:none' width='300.0' height='300.0' x='0.0' y='0.0'> </rect>
<path class='bond-0 atom-0 atom-1' d='M 49.1,144.6 L 71.8,157.7' style='fill:none;fill-rule:evenodd;stroke:#FF0000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1' />
<path class='bond-0 atom-0 atom-1' d='M 71.8,157.7 L 94.5,170.8' style='fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1' />
<path class='bond-1 atom-1 atom-2' d='M 94.5,170.8 L 150.5,138.5' style='fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1' />
<path class='bond-2 atom-2 atom-3' d='M 150.5,138.5 L 206.4,170.8' style='fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1' />
<path class='bond-3 atom-3 atom-4' d='M 206.4,170.8 L 229.1,157.7' style='fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1' />
<path class='bond-3 atom-3 atom-4' d='M 229.1,157.7 L 251.8,144.6' style='fill:none;fill-rule:evenodd;stroke:#FF0000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1' />
<path class='atom-0' d='M 13.6 129.4
L 16.1 129.4
L 16.1 137.2
L 25.5 137.2
L 25.5 129.4
L 28.0 129.4
L 28.0 147.7
L 25.5 147.7
L 25.5 139.3
L 16.1 139.3
L 16.1 147.7
L 13.6 147.7
L 13.6 129.4
' fill='#FF0000'/>
<path class='atom-0' d='M 30.2 138.5
Q 30.2 134.1, 32.3 131.7
Q 34.5 129.2, 38.5 129.2
Q 42.6 129.2, 44.8 131.7
Q 46.9 134.1, 46.9 138.5
Q 46.9 143.0, 44.8 145.5
Q 42.6 148.0, 38.5 148.0
Q 34.5 148.0, 32.3 145.5
Q 30.2 143.0, 30.2 138.5
M 38.5 145.9
Q 41.3 145.9, 42.8 144.1
Q 44.4 142.2, 44.4 138.5
Q 44.4 134.9, 42.8 133.1
Q 41.3 131.3, 38.5 131.3
Q 35.8 131.3, 34.2 133.1
Q 32.7 134.9, 32.7 138.5
Q 32.7 142.2, 34.2 144.1
Q 35.8 145.9, 38.5 145.9
' fill='#FF0000'/>
<path class='atom-4' d='M 254.0 138.5
Q 254.0 134.1, 256.1 131.7
Q 258.3 129.2, 262.4 129.2
Q 266.4 129.2, 268.6 131.7
Q 270.8 134.1, 270.8 138.5
Q 270.8 143.0, 268.6 145.5
Q 266.4 148.0, 262.4 148.0
Q 258.3 148.0, 256.1 145.5
Q 254.0 143.0, 254.0 138.5
M 262.4 145.9
Q 265.1 145.9, 266.6 144.1
Q 268.2 142.2, 268.2 138.5
Q 268.2 134.9, 266.6 133.1
Q 265.1 131.3, 262.4 131.3
Q 259.6 131.3, 258.0 133.1
Q 256.5 134.9, 256.5 138.5
Q 256.5 142.2, 258.0 144.1
Q 259.6 145.9, 262.4 145.9
' fill='#FF0000'/>
<path class='atom-4' d='M 272.0 129.4
L 274.5 129.4
L 274.5 137.2
L 283.9 137.2
L 283.9 129.4
L 286.4 129.4
L 286.4 147.7
L 283.9 147.7
L 283.9 139.3
L 274.5 139.3
L 274.5 147.7
L 272.0 147.7
L 272.0 129.4
' fill='#FF0000'/>
</svg>'''
def parse_svg(svg_string):
'''Gets all the paths and their attributes form an svg string.'''
doc = minidom.parseString(svg_string)
paths = [path for path in doc.getElementsByTagName('path')]
# this is where I want to make a list comprehension to get a list of attributes
# for all the attributes a path contains.
doc.unlink()
parse_svg(svg_string)
CodePudding user response:
You can access the attributes by 'attributes', which looks like a map/dict, and to get all the key-values use 'items()' method. So your codes may should look like this:
def parse_svg(svg_string):
'''Gets all the paths and their attributes form an svg string.'''
doc = minidom.parseString(svg_string)
paths = [path for path in doc.getElementsByTagName('path')]
for path in paths:
# all the attributes of the path
attrs = dict(path.attributes.items())
print(attrs) # do whatever you want about the attrs, here I just print
doc.unlink()
The print results looks like this:
{'class': 'bond-0 atom-0 atom-1', 'd': 'M 49.1,144.6 L 71.8,157.7', 'style': 'fill:none;fill-rule:evenodd;stroke:#FF0000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1'}
{'class': 'bond-0 atom-0 atom-1', 'd': 'M 71.8,157.7 L 94.5,170.8', 'style': 'fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1'}
{'class': 'bond-1 atom-1 atom-2', 'd': 'M 94.5,170.8 L 150.5,138.5', 'style': 'fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1'}
{'class': 'bond-2 atom-2 atom-3', 'd': 'M 150.5,138.5 L 206.4,170.8', 'style': 'fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1'}
{'class': 'bond-3 atom-3 atom-4', 'd': 'M 206.4,170.8 L 229.1,157.7', 'style': 'fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1'}
{'class': 'bond-3 atom-3 atom-4', 'd': 'M 229.1,157.7 L 251.8,144.6', 'style': 'fill:none;fill-rule:evenodd;stroke:#FF0000;stroke-width:2.0px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1'}
{'class': 'atom-0', 'd': 'M 13.6 129.4 L 16.1 129.4 L 16.1 137.2 L 25.5 137.2 L 25.5 129.4 L 28.0 129.4 L 28.0 147.7 L 25.5 147.7 L 25.5 139.3 L 16.1 139.3 L 16.1 147.7 L 13.6 147.7 L 13.6 129.4 ', 'fill': '#FF0000'}
{'class': 'atom-0', 'd': 'M 30.2 138.5 Q 30.2 134.1, 32.3 131.7 Q 34.5 129.2, 38.5 129.2 Q 42.6 129.2, 44.8 131.7 Q 46.9 134.1, 46.9 138.5 Q 46.9 143.0, 44.8 145.5 Q 42.6 148.0, 38.5 148.0 Q 34.5 148.0, 32.3 145.5 Q 30.2 143.0, 30.2 138.5 M 38.5 145.9 Q 41.3 145.9, 42.8 144.1 Q 44.4 142.2, 44.4 138.5 Q 44.4 134.9, 42.8 133.1 Q 41.3 131.3, 38.5 131.3 Q 35.8 131.3, 34.2 133.1 Q 32.7 134.9, 32.7 138.5 Q 32.7 142.2, 34.2 144.1 Q 35.8 145.9, 38.5 145.9 ', 'fill': '#FF0000'}
{'class': 'atom-4', 'd': 'M 254.0 138.5 Q 254.0 134.1, 256.1 131.7 Q 258.3 129.2, 262.4 129.2 Q 266.4 129.2, 268.6 131.7 Q 270.8 134.1, 270.8 138.5 Q 270.8 143.0, 268.6 145.5 Q 266.4 148.0, 262.4 148.0 Q 258.3 148.0, 256.1 145.5 Q 254.0 143.0, 254.0 138.5 M 262.4 145.9 Q 265.1 145.9, 266.6 144.1 Q 268.2 142.2, 268.2 138.5 Q 268.2 134.9, 266.6 133.1 Q 265.1 131.3, 262.4 131.3 Q 259.6 131.3, 258.0 133.1 Q 256.5 134.9, 256.5 138.5 Q 256.5 142.2, 258.0 144.1 Q 259.6 145.9, 262.4 145.9 ', 'fill': '#FF0000'}
{'class': 'atom-4', 'd': 'M 272.0 129.4 L 274.5 129.4 L 274.5 137.2 L 283.9 137.2 L 283.9 129.4 L 286.4 129.4 L 286.4 147.7 L 283.9 147.7 L 283.9 139.3 L 274.5 139.3 L 274.5 147.7 L 272.0 147.7 L 272.0 129.4 ', 'fill': '#FF0000'}