I cannot extract the text from an element using ElementTree-CodePudding

A snippet of my document and the code is as follows:

import xml.etree.ElementTree as ET
obj = ET.fromstring("""
   <tab>
    <infos><bounds left="7947" top="88607" width="10086" height="1184" bottom="89790" right="18032" mbFixSize="false" mbFrameAreaPositionValid="true" mbFrameAreaSizeValid="true" mbFramePrintAreaValid="true"/>     <prtBounds left="115" top="0" width="9300" height="1169" bottom="1168" right="9414"/> </infos>
    <row > <infos> <bounds left="8062" top="88607" width="9300" height="524" bottom="89130" right="17361" mbFixSize="false" mbFrameAreaPositionValid="true" mbFrameAreaSizeValid="true" mbFramePrintAreaValid="true"/>      <prtBounds left="0" top="0" width="9300" height="524" bottom="523" right="9299"/>      </infos>
     <cell ptr="000002232E644270" id="199" symbol="class SwCellFrame" next="202" upper="198" lower="200" rowspan="1"> <infos> <bounds left="8062" top="88607" width="546" height="524" bottom="89130" right="8607" mbFixSize="false" mbFrameAreaPositionValid="true" mbFrameAreaSizeValid="true" mbFramePrintAreaValid="true"/>        <prtBounds left="7" top="15" width="532" height="509" bottom="523" right="538"/>  </infos>
      <txt> <infos> <bounds left="8069" top="88622" width="532" height="187" bottom="88808" right="8600" mbFixSize="false" mbFrameAreaPositionValid="true" mbFrameAreaSizeValid="false" mbFramePrintAreaValid="true"/> <prtBounds left="0" top="3" width="532" height="184" bottom="186" right="531"/>        </infos>
       <Finish/>
      </txt>
      <txt> <infos> <bounds left="8069" top="88809" width="532" height="149" bottom="88957" right="8600" mbFixSize="false" mbFrameAreaPositionValid="true" mbFrameAreaSizeValid="false" mbFramePrintAreaValid="true"/> <prtBounds left="136" top="0" width="396" height="149" bottom="148" right="531"/> </infos>
UDA       <Finish/>
      </txt>
     </cell>
     <cell ptr="000002232E642E40" id="202" symbol="class SwCellFrame" next="205" prev="199" upper="198" lower="203" rowspan="1"> <infos> <bounds left="8608" top="88607" width="3283" height="524" bottom="89130" right="11890" mbFixSize="false" mbFrameAreaPositionValid="true" mbFrameAreaSizeValid="true" mbFramePrintAreaValid="true"/> <prtBounds left="7" top="15" width="3269" height="509" bottom="523" right="3275"/> </infos>
      <txt>
       <infos> <bounds left="8615" top="88622" width="3269" height="180" bottom="88801" right="11883" mbFixSize="false" mbFrameAreaPositionValid="true" mbFrameAreaSizeValid="false" mbFramePrintAreaValid="true"/> <prtBounds left="0" top="7" width="3269" height="173" bottom="179" right="3268"/> </infos> <Finish/>
      </txt>
      <txt> <infos> <bounds left="8615" top="88802" width="3269" height="149" bottom="88950" right="11883" mbFixSize="false" mbFrameAreaPositionValid="true" mbFrameAreaSizeValid="false" mbFramePrintAreaValid="true"/> <prtBounds left="58" top="0" width="3170" height="149" bottom="148" right="3227"/> </infos>
Nombre       <Finish/>
      </txt>
     </cell>
    </row>
  </tab>
""")
a = obj.findall('./row/cell/txt')
for i, item in enumerate(a):
    print(i, item.text.strip())

But if I simplify the document, I do manage to extract the text,

obj = ET.fromstring("""
   <tab>
    <row>
     <cell > 
      <txt > <Finish/> </txt>
      <txt > UDA <Finish/> </txt>
     </cell>
     <cell >
      <txt > <Finish/> </txt>
      <txt > Nombre       <Finish/> </txt>
     </cell>
   </row>
  </tab>
""")

a = obj.findall('./row/cell/txt')
for i, item in enumerate(a):
    print(i, item.text.strip())
0 
1 UDA
2 
3 Nombre

I don't know how to solve this problem, because my working document is very large and I can't simplify it as I have done in this example.

CodePudding user response：

The "UDA" and "Nombre" strings are found in the tail of infos elements. The easiest way to get the wanted output is to use itertext():

a = obj.findall('./row/cell/txt')
for i, item in enumerate(a):
    text = "".join([s.strip() for s in item.itertext()])
    print(i, text)