I'm using python pandas and flask for some postprocessing tasks (anlaysis and visualization). Until now I uploaded/read *.csv *.xlsx and *.xls via pd.read_csv, pd.read_xlsx. Everything worked quiet fine.
Now I have a *.xml file as datasource and tried according my habit pattern.
So i tried:
<form action="/input" method="POST" enctype="multipart/form-data">
<input type="file" name="file">
<input type="submit" name="Preview" value ="Preview Data" > </input>
from flask import Flask, render_template,request, render_template
import pandas as pd
import xml.etree.ElementTree as ET
@app.route("/input", methods=['POST', 'GET'])
def input():
if request.method == 'POST':
if request.form['Preview'] == "Preview Data":
file = request.files['file']
filename = file.filename
if '.xml' in filename:
content = pd.read_xml(file, parser='lxml')
But when I pass a .xml file to the app via the form. I get the error:
File "C:\ProgramData\MiniforgeEnvs\TestEnv\lib\site-packages\pandas\io\xml.py", line 627, in _parse_doc
with preprocess_data(handle_data) as xml_data:
AttributeError: __enter__
I tried check different options:
- when I use the inbuild xml.etree package it works fine:
import xml.etree.ElementTree as ET
if '.xml' in filename:
tree = ET.parse(file)
root = tree.getroot()
print(root[1][0][1].attrib)
- when I load the .xml direct from the app directory into pd.read_xml() it also works fine:
if '.xml' in filename:
content = pd.read_xml('SampleExport.xml', parser='lxml')
- I tried different prasers: "lxml" and "etree"
But at the end when I pass the .xml via the Form/input and using pd.read_xml(file,parser='lxml') I got the error from above.
CodePudding user response:
I just solved my issue even though I'm not quite sure why pd.read_xml() behaves different compared to pd.read_csv() or pd.read_xlsx().
pd.read_xml is not able to read a FileStorage object. The variable passed by request.file[] is a instance of the class: werkzeug.datastructures.FileStorage(stream=None, filename=None, name=None, content_type=None, content_length=None, headers=None).
Via the read function I extracted the file itsself.
filestorage = request.files['file']
file=filestorage.read()
with this passed to pd.read_xml it works fine.
Is there anybody who can explain why _parse_doc() funtion of pd.read_xml() is not able to read FileStotage type?