from bs4 import BeautifulSoup
# current output as below
"""
'DOMINGUEZ, JONATHAN D. VS. RAMOS,\n
SILVIA M'
"""
# desired one is
# DOMINGUEZ, JONATHAN D. VS. RAMOS, SILVIA M
x = """<td width="350px" valign="top"
style="padding:.5rem;">
DOMINGUEZ, JONATHAN D. VS. RAMOS,
SILVIA M
</td>"""
soup = BeautifulSoup(x, 'lxml')
print(soup.select_one('td').get_text(strip=True, separator='\n'))
I checked the docs and I believe that get_text()
can do that but am not sure how!
CodePudding user response:
change separator='\n'
to separator=' '
CodePudding user response:
You can apply stripped_strings
method
from bs4 import BeautifulSoup
x = """<td width="350px" valign="top"
style="padding:.5rem;">
DOMINGUEZ, JONATHAN D. VS. RAMOS,
SILVIA M
</td>"""
soup = BeautifulSoup(x, 'lxml')
txt=''.join([x.replace('\n','') for x in list(soup.select_one('td').stripped_strings)])
print(txt)
Output:
DOMINGUEZ, JONATHAN D. VS. RAMOS, SILVIA M