Home > OS >  How to access a GCS Blob that contains an xml file in a bucket with the pandas.read_xml() function i
How to access a GCS Blob that contains an xml file in a bucket with the pandas.read_xml() function i

Time:08-16

I would like to access a blob file via the pandas.read_xml() function. Like this:

pandas.read_xml(blob.open())

When printing the blob it looks like this:

<Blob: Bucket, filename.0.xml.gz, 1612169959288959>

the blob.open()function gives this:

<_io.TextIOWrapper encoding='iso-8859-1'>

and I get the error UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte. When I change the code to: blob.open(mode='rt', encoding='iso-8859-1') I get ther error lxml.etree.XMLSyntaxError: Start tag expected, '<' not found, line 1, column 1.

Is there even a way to read in a xml file from a bucket on gcs?

CodePudding user response:

enter image description here

  • Related