I have a file xml like this (input):
> <?xml version="1.0" encoding="UTF-8" standalone="yes"?\>
> <obs id="0"\>
> <dim name="Column1" value="a"/\>
> <dim name="Column2" value="b"/\>
> </obs\>
> <obs id="1">
> <dim name="Column1" value="tr"/\>
> <dim name="Column2" value="yu"/\>
> </obs\>
How can i do for parse in xlsx file? i would like to have a xlsx file like this:
Column1|Column2 a |b tr |yu
Column1 | Column2 |
---|---|
a | b |
tr | yu |
Thanks a lot. I've tried with other xml parser but i did not realised the solution.
CodePudding user response:
You can use BeautifulSoup
to parse the XML document pandas
to save the dataframe to CSV and/or Excel format:
import pandas as pd
from bs4 import BeautifulSoup
with open("your_file.xml", "r") as f_in:
soup = BeautifulSoup(f_in.read(), "html.parser") # <-- you can ignore the warning or use different parser, such as `xml`
all_data = []
for obs in soup.select("obs"):
d = {}
for dim in obs.select("dim[name][value]"):
d[dim["name"]] = dim["value"]
all_data.append(d)
df = pd.DataFrame(all_data)
print(df)
df.to_csv('data.csv', index=False)
This prints:
Column1 Column2
0 a b
1 tr yu
and saves data.csv
.
Input file was:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<obs id="0">
<dim name="Column1" value="a" />
<dim name="Column2" value="b" />
</obs>
<obs id="1">
<dim name="Column1" value="tr" />
<dim name="Column2" value="yu" />
</obs>