Home > Enterprise >  How to remove extra space in comment line in XML using Python
How to remove extra space in comment line in XML using Python

Time:10-18

How to remove extra spaces from comment line using lxml

I had tried commenting a Necessary tag using following code:

tc.getparent().replace(tc,etree.Comment(etree.tostring(tc)))
print(etree.tostring(doc2).decode())
<List>
    <Item>
        <Price>
            <Amount>100</Amount>
            <Next_Item>
                <Name>Apple</Name>
                <!--<Necessary/>
                -->
            </Next_Item>
            <Next_Item>
                <Name>Orange</Name>
                <!--<Necessary/>
                -->
            </Next_Item>
        </Price>
    </Item>
</List>

i have already tried with beautifulsoup but spaces are still there in comment

soup = BeautifulSoup(open('XML1.xml', 'r'), 'xml')

    for elem in soup.find_all():
        if elem.string is not None:
            elem.string = elem.string.strip()

required XML is as follows:

<List>
    <Item>
        <Price>
            <Amount>100</Amount>
            <Next_Item>
                <Name>Apple</Name>
                <!--<Necessary/>-->
            </Next_Item>
            <Next_Item>
                <Name>Orange</Name>
                <!--<Necessary/>-->
            </Next_Item>
        </Price>
    </Item>
</List>

My problem is with extra new line in tag: Necessary/> and "-->" where "-->" is going to next line .

Any help would be grateful

CodePudding user response:

The "extra" new line after the comment belongs to the element used as comment text. So this string already contains the extra withespace including the next element indent
etree.tostring(ele)

Keeping that tail text and applying to the Comment fixes the issue.

>>> doc = etree.parse('test.xml')
>>> for ele in doc.xpath('//Necessary'):
...     t = ele.tail
...     c = etree.Comment(etree.tostring(ele, with_tail=False))
...     c.tail = t
...     ele.getparent().replace(ele, c)
... 
>>> print(etree.tostring(doc).decode())

Result

<List>
  <Item>
    <Price>
      <Amount>100</Amount>
      <Next_Item>
        <Name>Apple</Name>
        <!--<Necessary/>-->
      </Next_Item>
      <Next_Item>
        <Name>Orange</Name>
        <!--<Necessary/>-->
      </Next_Item>
    </Price>
  </Item>
</List>

CodePudding user response:

You could select all comments by invoke Comment and replace them by a stripped version:

for c in soup.find_all(text=lambda text:isinstance(text, Comment)):
    c.replace_with(Comment(c.strip()))

Example

from bs4 import BeautifulSoup
from bs4 import Comment

xml = '''
<List>
    <Item>
        <Price>
            <Amount>100</Amount>
            <Next_Item>
                <Name>Apple</Name>
                <!--<Necessary/>
                -->
            </Next_Item>
            <Next_Item>
                <Name>Orange</Name>
                <!--<Necessary/>
                -->
            </Next_Item>
        </Price>
    </Item>
</List>
'''
soup = BeautifulSoup(xml, 'xml')
for c in soup.find_all(text=lambda text:isinstance(text, Comment)):
    c.replace_with(Comment(c.strip()))
soup

Output

<?xml version="1.0" encoding="utf-8"?>
<List>
<Item>
<Price>
<Amount>100</Amount>
<Next_Item>
<Name>Apple</Name>
<!--<Necessary/>-->
</Next_Item>
<Next_Item>
<Name>Orange</Name>
<!--<Necessary/>-->
</Next_Item>
</Price>
</Item>
</List>
  • Related