Home > Enterprise >  Opening a draw.io file using python
Opening a draw.io file using python

Time:12-01

I am trying to read the data from a draw.io drawing using python.

Apparently the format is an xml with some portions in "mxfile" encoding.

(That is, a section of the xml is deflated, then base64 encoded.)

Here's the official TFM: https://drawio-app.com/extracting-the-xml-from-mxfiles/

And their online decoder tool: https://jgraph.github.io/drawio-tools/tools/convert.html

So i try to decode the mxfile portion using the standard python tools:

import base64

s="7VvbcuI4FPwaHpOybG55BHKZmc1kmSGb7KvAArTIFiuLEObr58jINxTATvA4IVSlKtaxLFvq1lGrbWpOz3u EXg /c5dwmq25T7XnMuabSOr3YR/KrJaR9pIByaCurpSEhjQXyS6UkcX1CVBpqLknEk6zwZH3PfJSGZiWAi zFYbc5a96xxPiBEYjDAzo4/UlVPdC7uVxL8QOplGd0bNi/UZD0eVdU CKXb5MhVyrmpOT3Au10fec48wNXjRuDx XT2y21nz5tuP4H/8T/ev 7uHs3Vj10UuibsgiC9f3fSv2fj6y0P9v3/n/esfS umM/x2pi xnjBb6PHqExFwX/dYrqJhDJbUY9iHUnfMfTnQZ2AQupjRiQ/HI3g6IiDwRISkgEBHn5B8DtHRlDL3Fq/4QvUhkHg0i0rdKRf0FzSLGZxCEIDTQmoy2c1MjYG6EsIWRAUJoE4/GhgUh25xIHWdEWcMzwM6DB9YVfGwmFC/y6XkXtQQX/gucXUpRjosSMFnMXfU9Tnh0LCp0SDPKTJqeG4I94gUK6iiz8ZM01MNReVlQlzU1LFpmrROW08YPVkmcdvx7X7C5ML BAYhuZ zcb96zvvZzeztMAPgfSxJVw1jkKYhHKS6moRCchYgKjKIeoc9YtAURlqmKMnIWG4lZDDHI pPbsM6l/Uk8lP3VIU4XDtmIRmm1HWJH5JFYonXfFIMmXPqy3AoGl34gwHrWeeNWgMeqAdllJThT1UXssd94BWmIYEIkHVJFGFfoNbOabufWqssYkWRTRMpA2lR/Gwz0Uy5r8h4t/CGkDaODckdGWUqPaYPy8K7YVeMt2PgfeVhqi7ruC7k6OAE EEBb7UrBrxuAG4gzGioH/RooBfX1j3wewCkai7C 17R4fIMGZxwTE44L DP8JCwPg opFy1L9Z1N3hRVdZGVj0fqjuW/zeB2jCz9kKMpjhQiRtk1wyGNzw6wvlcGqio6tzcNFAdyIWruplT9Vsn1X841Y82VL/TLFf1ow3V77Tfr pvbWfqserGnGmnmZtm72UH0Daw7MDTK/fGtr7DUnJ0SB5UEBbGu/IdwMVJEB4c1Lwqvyw9iEy/8CskfusK0AiXWtu656rsC65aO7IZndZA9bIwbledqJHptd0QteIOiEd9LBTg93hGTJP4o NbFqTVS/7oAXZlY K7HfXCBUpDxpXa7kJIy3FkrYvXlEUr1x69nF3 iDsh0dQhbMiXV0mgGwbgRMSUwmo74LAtJfshg/3FhOTYzamn3QnsS0AKwrCkT9n3Tju0eV8RN9HltpXV5bblZJtYd1JflX7RU7Sh9SgYDR3Mqje9v77gYxIE3JTrpx1m TtMZ3PHl3eH2bL2kviFDaZTz7HBbL2PDSYybcsBZlhn3E 4tsWT9 NsLJHpUhroffadRnFY8 4fS9tqmC7lp1IsEWLvWrKgjUzfeqVkcTYaslsbz1K2ZDGNxm2vKU CpXzB0rDaGTrk/hDGRjsWme2KpdH4QB/CmD7qQApCzJc3n0WxtHLT690oFtMb7VF5fJrzoA54cZwrt8Bt0y6FpC2P77O1ioGu/OMX27RMQdmrVdy2etw9AX5gwHN/GFMe4qah2oMxkUfoHFSNtfNKMXY4rE1D0wD50xsMxXFt5JRhZTkMtun9PQBE7jEu0OWh2Kw8E5v2398LOV8oe6Gj3lXeqnlwQjQ3oheV59ti1h fh2NdzNyLfUFUvdWnx3av0xdhudfq0zgrKqVtjbp oDe6fvH7nJgwdraJvK5fo76noS2un9HQ2eYbp412 HgckFKMQ9s0Dq3z8wj4hK6hGZdKBHvSzlBbcus1vItHs0nI3x5nXMB5nycGpHa77fw5IZpf ieX rFq8c/P8ht1Z29kVETMPwaXaZ7lxyrSTx8VrMPM/uib3D8OnemZMeiFWuDxVu8zJcc3UTVVcB4HP9bou7Eu5KK/kRgGAbZxJf86cXEYpjhZFz9K0m/hChSTH1yvqyc/W3eufgM="

result=zlib.decompress(base64.b64decode(s))

Throws the exception:

zlib.error: Error -3 while decompressing data: incorrect header check

Meanwhile their tool above returns xml just fine when given the exact same data.

What am I missing?

CodePudding user response:

Try this:

import zlib
import base64
import xml.etree.ElementTree as ET
from urllib.parse import unquote

tree = ET.parse(filename)
data = base64.b64decode(tree.find('diagram').text)
xml = zlib.decompress(data, wbits=-15)
xml = unquote(xml)

If you read the source of their html tool, you will see this:

data = String.fromCharCode.apply(null, new Uint8Array(pako.deflateRaw(data)));

They are using a JS library called pako and 'raw' mode. From github source you can get the required setting.

  • Related