Home > Mobile >  how do I find specific tag in xml file in Python?
how do I find specific tag in xml file in Python?

Time:12-29

I have an XML file and I try to find a specific tag in it. but the tags are different in hirechcal sequence. I try to find tag "MotionVectore" and then calculate the average motion vector value for a specific frame type (P, B or I frame). in the following I put part of this XML file:

<Picture id="1" poc="1">
        <GOPNr>0</GOPNr>
        <SubPicture structure="0">
            <Slice num="0">
                <Type>0</Type>
                <TypeString>SLICE_TYPE_P</TypeString>
                <NAL>
                    <Num>5</Num>
                    <Type>1</Type>
                    <TypeString>NALU_TYPE_SLICE</TypeString>
                    <Length>47048</Length>
                </NAL>
                <MacroBlock num="0">
                    <MotionVector list="0">
                        <RefIdx>0</RefIdx>
                        <Difference>
                            <X>184</X>
                            <Y>149</Y>
                        </Difference>
                        <Absolute>
                            <X>184</X>
                            <Y>149</Y>
                        </Absolute>
                    </MotionVector>

as you can see the order of the tags to achieve the X and Y value is Picture/SubPicture/Slice/MacroBlock/MotionVector/Absolute/Xbut some times this order is Picture/SubPicture/Slice/MacroBlock/SubMacroBlock/MotionVector/Absolute/Xso when I use this code

 abs_x_tag=list(qpy_node.text for qpy_node in root.findall('Picture/SubPicture/Slice/MacroBlock/SubMacroBlock/MotionVector/Absolute/X'))

to extract all X values it can not extract all X values and also I have to calculate motion vectors for different frame types based on this tag

<TypeString>SLICE_TYPE_P</TypeString>

and based on these limitations I do not know how can I extract the X and Y values for each frame type separately. I can extract all X and Y values using the mentioned code but I do not know how do I find these values based on the type of frame. could you please help me with this issue? Thanks.

CodePudding user response:

Here an example how can you parse this xml with BeautifulSoup

Installing BeautifulSoup and lxml

pip install BeautifulSoup4 lxml

Code:

from bs4 import BeautifulSoup


XML = """
<Picture id="1" poc="1">
        <GOPNr>0</GOPNr>
        <SubPicture structure="0">
            <Slice num="0">
                <Type>0</Type>
                <TypeString>SLICE_TYPE_P</TypeString>
                <NAL>
                    <Num>5</Num>
                    <Type>1</Type>
                    <TypeString>NALU_TYPE_SLICE</TypeString>
                    <Length>47048</Length>
                </NAL>
                <MacroBlock num="0">
                    <MotionVector list="0">
                        <RefIdx>0</RefIdx>
                        <Difference>
                            <X>184</X>
                            <Y>149</Y>
                        </Difference>
                        <Absolute>
                            <X>184</X>
                            <Y>149</Y>
                        </Absolute>
                    </MotionVector>
                </MacroBlock>
            </Slice>
        </SubPicture>
</Picture>
"""

soup = BeautifulSoup(XML, 'xml')

slices = soup.find_all('Slice')
for slice in slices:
    type = slice.find('TypeString').text
    print(f"Type: {type}")
    vectors = slice.find_all('MotionVector')
    for vector in vectors:
        print("Vector:")
        difference = vector.find('Difference')
        difference_x = difference.find('X').text
        difference_y = difference.find('Y').text

        absolute = vector.find('Absolute')
        absolute_x = absolute.find('X').text
        absolute_y = absolute.find('Y').text

        # Here you know type and x, y and type

        print(f"Difference: {difference_x}, {difference_y}")
        print(f"Absolute: {absolute_x}, {absolute_y}")

Output:

Type: SLICE_TYPE_P
Vector:
Difference: 184, 149
Absolute: 184, 149

CodePudding user response:

We can do it in a simple way and see the output below:

import xml.etree.ElementTree as ET

SampleXML = """
<Picture id="1" poc="1">
        <GOPNr>0</GOPNr>
        <SubPicture structure="0">
            <Slice num="0">
                <Type>0</Type>
                <TypeString>SLICE_TYPE_P</TypeString>
                <NAL>
                    <Num>5</Num>
                    <Type>1</Type>
                    <TypeString>NALU_TYPE_SLICE</TypeString>
                    <Length>47048</Length>
                </NAL>
                <MacroBlock num="0">
                    <MotionVector list="0">
                        <RefIdx>0</RefIdx>
                        <Difference>
                            <X>184</X>
                            <Y>149</Y>
                        </Difference>
                        <Absolute>
                            <X>184</X>
                            <Y>149</Y>
                        </Absolute>
                    </MotionVector>
                </MacroBlock>
            </Slice>
        </SubPicture>
</Picture>
"""
# use below commented lines if you are reading from xml file and replace XMl absolute path with <InputXML>
# tree = ET.parse(r"<InputXML>")
# root = tree.getroot()
root = ET.fromstring(SampleXML)
TypeString = root.findall("./SubPicture/Slice/TypeString")
print("TypeString: ", TypeString[0].text)
abs_x_tag = root.findall("./SubPicture/Slice/MacroBlock/MotionVector/Absolute/X") or root.findall("./SubPicture/Slice/MacroBlock/SubMacroBlock/MotionVector/Absolute/X")
print("abs_x_tag: ", abs_x_tag[0].text)

Output:

TypeString: SLICE_TYPE_P

abs_x_tag: 184

  • Related