Home > Back-end >  Ho to parse xml file to xlsx in python
Ho to parse xml file to xlsx in python

Time:12-21

I have a file xml like this (input):

> <?xml version="1.0" encoding="UTF-8" standalone="yes"?\>
> <obs id="0"\>
> <dim name="Column1" value="a"/\>
> <dim name="Column2" value="b"/\>
> </obs\>
> <obs id="1">
> <dim name="Column1" value="tr"/\>
> <dim name="Column2" value="yu"/\>
> </obs\>

How can i do for parse in xlsx file? i would like to have a xlsx file like this:

Column1|Column2 a |b tr |yu

Column1 Column2
a b
tr yu

Thanks a lot. I've tried with other xml parser but i did not realised the solution.

CodePudding user response:

You can use BeautifulSoup to parse the XML document pandas to save the dataframe to CSV and/or Excel format:

import pandas as pd
from bs4 import BeautifulSoup


with open("your_file.xml", "r") as f_in:
    soup = BeautifulSoup(f_in.read(), "html.parser")  # <-- you can ignore the warning or use different parser, such as `xml`

all_data = []
for obs in soup.select("obs"):
    d = {}
    for dim in obs.select("dim[name][value]"):
        d[dim["name"]] = dim["value"]

    all_data.append(d)

df = pd.DataFrame(all_data)
print(df)

df.to_csv('data.csv', index=False)

This prints:

  Column1 Column2
0       a       b
1      tr      yu

and saves data.csv.


Input file was:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<obs id="0">
<dim name="Column1" value="a" />
<dim name="Column2" value="b" />
</obs>
<obs id="1">
<dim name="Column1" value="tr" />
<dim name="Column2" value="yu" />
</obs>
  • Related