I would like to access a blob file via the pandas.read_xml() function. Like this:
pandas.read_xml(blob.open())
When printing the blob it looks like this:
<Blob: Bucket, filename.0.xml.gz, 1612169959288959>
the blob.open()
function gives this:
<_io.TextIOWrapper encoding='iso-8859-1'>
and I get the error UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte
. When I change the code to: blob.open(mode='rt', encoding='iso-8859-1')
I get ther error lxml.etree.XMLSyntaxError: Start tag expected, '<' not found, line 1, column 1
.
Is there even a way to read in a xml file from a bucket on gcs?
CodePudding user response: