Home > OS >  Is there a way to stop BeautifulSoup from removing leading spaces?
Is there a way to stop BeautifulSoup from removing leading spaces?

Time:12-21

I am using BeautifulSoup in a project and noticed that it removes leading spaces. For example:

from bs4 import BeautifulSoup
sample = " Test"
soup = BeautifulSoup(sample, features="lxml")
[s.extract() for s in soup(["style", "script", "[document]", "head", "title"])]
print(soup.getText(strip=False))

Returns "Test"

I tried setting the strip option to "False" but it did not help and I cannot find any discussion of this behavior anywhere. This is a MWE but the goal is to take HTML-formatted input and print the plain text.

CodePudding user response:

To avoid the leading whitespace, you can use html.parser instead of lxml as your parser:

soup = BeautifulSoup(html_doc, 'html.parser')

See the BeautifulSoup documentation on using different parser:

But if the document is not perfectly-formed, different parsers will give different results...

  • Related