I am using BeautifulSoup to parse a webpage.
from bs4 import BeautifulSoup as Soup
How do I get an array of strings from table headers like
soup = Soup(html, features="html.parser")
headers = soup.select("#StatusGrid thead tr").map(lambda x: x.text)
Select does not return a list. Can I inspect the type that it returns?
CodePudding user response:
Without any example of your html it is hard to help, you may can provide an url or some html. However, to generate your list with map()
put the list
inside.
Alternativ use a list comprehension
:
[x.text for x in soup.select("#StatusGrid thead tr")]
Checking the type will give you bs4.element.ResultSet
:
type(soup.select("#StatusGrid thead tr"))
Example
from bs4 import BeautifulSoup as Soup
html='''
<table id="StatusGrid">
<thead>
<tr><td>1</td></tr>
<tr><td>2</td></tr>
<tr><td>3</td></tr>
</thead>
</table>
'''
soup = Soup(html, features="html.parser")
list(map(lambda x: x.text, soup.select("#StatusGrid thead tr")))
Output
['1', '2', '3']
CodePudding user response:
Use .stripped_strings
From the Docs
.stripped_strings
yields Python strings that have had whitespace stripped.
Since it returns a generator
you can convert it to a list
to have an array of strings.
Here is how to use it.
from bs4 import BeautifulSoup as Soup
html='''
<table id="StatusGrid">
<thead>
<tr><td>Heading-1</td></tr>
<tr><td>Heading-2</td></tr>
<tr><td>Heading-3</td></tr>
</thead>
<tbody>
</tbody>
</table>
'''
soup = Soup(html, features="lxml")
t = soup.find('table', {'id': 'StatusGrid'}).find('thead')
print(list(t.stripped_strings))
['Heading-1', 'Heading-2', 'Heading-3']