Home > Software design >  open XMLfile in aws sagemaker
open XMLfile in aws sagemaker

Time:10-20

I am looking for a way to display xml data and extract it to more accessible format. I am on AWS SageMaker - connected to s3 (via the code below), but I am not sure how to simply open this xml file in my notebook.

my_bucket = 'my-s3-bucket'
my_file = 'my_file.xml'
s3client = boto3.client('s3')
response = s3client.get_object(Bucket=my_bucket, Key=my_file)

CodePudding user response:

The body read by S3 is in bytes, in the case of XML it will be of the form b"your_xml_content".

So you need to read this type of data and then, using any xml manipulation library, read in its content and navigate through it.

A complete example is this:

from lxml import etree
import boto3

xml_bytes = boto3.resource('s3').Object(bucket, my_file).get()['Body'].read()
doc = etree.XML(xml_bytes)

Assuming your file is in this form:

<note>
    <to>Tove</to>
    <from>Jani</from>
    <heading>Reminder</heading>
    <body>Don't forget me this weekend!</body>
</note>

you can navigate the object with a simple for loop:

for element in doc.iter():  # remove .iter() if you don't want the root
    print("%s - %s" % (element.tag, element.text))

output will be:

note - 
        
to - Tove
from - Jani
heading - Reminder
body - Don't forget me this weekend!
  • Related