How to remove extra spaces from comment line using lxml
I had tried commenting a Necessary tag using following code:
tc.getparent().replace(tc,etree.Comment(etree.tostring(tc)))
print(etree.tostring(doc2).decode())
<List>
<Item>
<Price>
<Amount>100</Amount>
<Next_Item>
<Name>Apple</Name>
<!--<Necessary/>
-->
</Next_Item>
<Next_Item>
<Name>Orange</Name>
<!--<Necessary/>
-->
</Next_Item>
</Price>
</Item>
</List>
i have already tried with beautifulsoup but spaces are still there in comment
soup = BeautifulSoup(open('XML1.xml', 'r'), 'xml')
for elem in soup.find_all():
if elem.string is not None:
elem.string = elem.string.strip()
required XML is as follows:
<List>
<Item>
<Price>
<Amount>100</Amount>
<Next_Item>
<Name>Apple</Name>
<!--<Necessary/>-->
</Next_Item>
<Next_Item>
<Name>Orange</Name>
<!--<Necessary/>-->
</Next_Item>
</Price>
</Item>
</List>
My problem is with extra new line in tag: Necessary/> and "-->" where "-->" is going to next line .
Any help would be grateful
CodePudding user response:
The "extra" new line after the comment belongs to the element used as comment text. So this string already contains the extra withespace including the next element indent
etree.tostring(ele)
Keeping that tail text and applying to the Comment fixes the issue.
>>> doc = etree.parse('test.xml')
>>> for ele in doc.xpath('//Necessary'):
... t = ele.tail
... c = etree.Comment(etree.tostring(ele, with_tail=False))
... c.tail = t
... ele.getparent().replace(ele, c)
...
>>> print(etree.tostring(doc).decode())
Result
<List>
<Item>
<Price>
<Amount>100</Amount>
<Next_Item>
<Name>Apple</Name>
<!--<Necessary/>-->
</Next_Item>
<Next_Item>
<Name>Orange</Name>
<!--<Necessary/>-->
</Next_Item>
</Price>
</Item>
</List>
CodePudding user response:
You could select all comments by invoke Comment
and replace them by a stripped version:
for c in soup.find_all(text=lambda text:isinstance(text, Comment)):
c.replace_with(Comment(c.strip()))
Example
from bs4 import BeautifulSoup
from bs4 import Comment
xml = '''
<List>
<Item>
<Price>
<Amount>100</Amount>
<Next_Item>
<Name>Apple</Name>
<!--<Necessary/>
-->
</Next_Item>
<Next_Item>
<Name>Orange</Name>
<!--<Necessary/>
-->
</Next_Item>
</Price>
</Item>
</List>
'''
soup = BeautifulSoup(xml, 'xml')
for c in soup.find_all(text=lambda text:isinstance(text, Comment)):
c.replace_with(Comment(c.strip()))
soup
Output
<?xml version="1.0" encoding="utf-8"?>
<List>
<Item>
<Price>
<Amount>100</Amount>
<Next_Item>
<Name>Apple</Name>
<!--<Necessary/>-->
</Next_Item>
<Next_Item>
<Name>Orange</Name>
<!--<Necessary/>-->
</Next_Item>
</Price>
</Item>
</List>