I am looking for a way to display xml data and extract it to more accessible format. I am on AWS SageMaker - connected to s3 (via the code below), but I am not sure how to simply open this xml file in my notebook.
my_bucket = 'my-s3-bucket'
my_file = 'my_file.xml'
s3client = boto3.client('s3')
response = s3client.get_object(Bucket=my_bucket, Key=my_file)
CodePudding user response:
The body read by S3 is in bytes, in the case of XML it will be of the form b"your_xml_content"
.
So you need to read this type of data and then, using any xml manipulation library, read in its content and navigate through it.
A complete example is this:
from lxml import etree
import boto3
xml_bytes = boto3.resource('s3').Object(bucket, my_file).get()['Body'].read()
doc = etree.XML(xml_bytes)
Assuming your file is in this form:
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
you can navigate the object with a simple for loop:
for element in doc.iter(): # remove .iter() if you don't want the root
print("%s - %s" % (element.tag, element.text))
output will be:
note -
to - Tove
from - Jani
heading - Reminder
body - Don't forget me this weekend!