how do I find specific tag in xml file in Python?-CodePudding

I have an XML file and I try to find a specific tag in it. but the tags are different in hirechcal sequence. I try to find tag "MotionVectore" and then calculate the average motion vector value for a specific frame type (P, B or I frame). in the following I put part of this XML file:

<Picture id="1" poc="1">
        <GOPNr>0</GOPNr>
        <SubPicture structure="0">
            <Slice num="0">
                <Type>0</Type>
                <TypeString>SLICE_TYPE_P</TypeString>
                <NAL>
                    <Num>5</Num>
                    <Type>1</Type>
                    <TypeString>NALU_TYPE_SLICE</TypeString>
                    <Length>47048</Length>
                </NAL>
                <MacroBlock num="0">
                    <MotionVector list="0">
                        <RefIdx>0</RefIdx>
                        <Difference>
                            <X>184</X>
                            <Y>149</Y>
                        </Difference>
                        <Absolute>
                            <X>184</X>
                            <Y>149</Y>
                        </Absolute>
                    </MotionVector>

as you can see the order of the tags to achieve the X and Y value is Picture/SubPicture/Slice/MacroBlock/MotionVector/Absolute/Xbut some times this order is Picture/SubPicture/Slice/MacroBlock/SubMacroBlock/MotionVector/Absolute/Xso when I use this code

 abs_x_tag=list(qpy_node.text for qpy_node in root.findall('Picture/SubPicture/Slice/MacroBlock/SubMacroBlock/MotionVector/Absolute/X'))

to extract all X values it can not extract all X values and also I have to calculate motion vectors for different frame types based on this tag

<TypeString>SLICE_TYPE_P</TypeString>

and based on these limitations I do not know how can I extract the X and Y values for each frame type separately. I can extract all X and Y values using the mentioned code but I do not know how do I find these values based on the type of frame. could you please help me with this issue? Thanks.

CodePudding user response：

Here an example how can you parse this xml with BeautifulSoup

Installing BeautifulSoup and lxml

pip install BeautifulSoup4 lxml

Code:

from bs4 import BeautifulSoup


XML = """
<Picture id="1" poc="1">
        <GOPNr>0</GOPNr>
        <SubPicture structure="0">
            <Slice num="0">
                <Type>0</Type>
                <TypeString>SLICE_TYPE_P</TypeString>
                <NAL>
                    <Num>5</Num>
                    <Type>1</Type>
                    <TypeString>NALU_TYPE_SLICE</TypeString>
                    <Length>47048</Length>
                </NAL>
                <MacroBlock num="0">
                    <MotionVector list="0">
                        <RefIdx>0</RefIdx>
                        <Difference>
                            <X>184</X>
                            <Y>149</Y>
                        </Difference>
                        <Absolute>
                            <X>184</X>
                            <Y>149</Y>
                        </Absolute>
                    </MotionVector>
                </MacroBlock>
            </Slice>
        </SubPicture>
</Picture>
"""

soup = BeautifulSoup(XML, 'xml')

slices = soup.find_all('Slice')
for slice in slices:
    type = slice.find('TypeString').text
    print(f"Type: {type}")
    vectors = slice.find_all('MotionVector')
    for vector in vectors:
        print("Vector:")
        difference = vector.find('Difference')
        difference_x = difference.find('X').text
        difference_y = difference.find('Y').text

        absolute = vector.find('Absolute')
        absolute_x = absolute.find('X').text
        absolute_y = absolute.find('Y').text

        # Here you know type and x, y and type

        print(f"Difference: {difference_x}, {difference_y}")
        print(f"Absolute: {absolute_x}, {absolute_y}")

Output:

Type: SLICE_TYPE_P
Vector:
Difference: 184, 149
Absolute: 184, 149

CodePudding user response：

We can do it in a simple way and see the output below:

import xml.etree.ElementTree as ET

SampleXML = """
<Picture id="1" poc="1">
        <GOPNr>0</GOPNr>
        <SubPicture structure="0">
            <Slice num="0">
                <Type>0</Type>
                <TypeString>SLICE_TYPE_P</TypeString>
                <NAL>
                    <Num>5</Num>
                    <Type>1</Type>
                    <TypeString>NALU_TYPE_SLICE</TypeString>
                    <Length>47048</Length>
                </NAL>
                <MacroBlock num="0">
                    <MotionVector list="0">
                        <RefIdx>0</RefIdx>
                        <Difference>
                            <X>184</X>
                            <Y>149</Y>
                        </Difference>
                        <Absolute>
                            <X>184</X>
                            <Y>149</Y>
                        </Absolute>
                    </MotionVector>
                </MacroBlock>
            </Slice>
        </SubPicture>
</Picture>
"""
# use below commented lines if you are reading from xml file and replace XMl absolute path with <InputXML>
# tree = ET.parse(r"<InputXML>")
# root = tree.getroot()
root = ET.fromstring(SampleXML)
TypeString = root.findall("./SubPicture/Slice/TypeString")
print("TypeString: ", TypeString[0].text)
abs_x_tag = root.findall("./SubPicture/Slice/MacroBlock/MotionVector/Absolute/X") or root.findall("./SubPicture/Slice/MacroBlock/SubMacroBlock/MotionVector/Absolute/X")
print("abs_x_tag: ", abs_x_tag[0].text)

Output:

TypeString: SLICE_TYPE_P

abs_x_tag: 184