Use map with return value of soup.select to get table headers-CodePudding

I am using BeautifulSoup to parse a webpage.

from bs4 import BeautifulSoup as Soup

How do I get an array of strings from table headers like

soup = Soup(html, features="html.parser")
headers = soup.select("#StatusGrid thead tr").map(lambda x: x.text)

Select does not return a list. Can I inspect the type that it returns?

CodePudding user response：

Without any example of your html it is hard to help, you may can provide an url or some html. However, to generate your list with map() put the list inside.

Alternativ use a list comprehension:

[x.text for x in soup.select("#StatusGrid thead tr")]

Checking the type will give you bs4.element.ResultSet:

type(soup.select("#StatusGrid thead tr"))

Example

from bs4 import BeautifulSoup as Soup
html='''
<table id="StatusGrid">
<thead>
<tr><td>1</td></tr>
<tr><td>2</td></tr>
<tr><td>3</td></tr>
</thead>
</table>
'''

soup = Soup(html, features="html.parser")
list(map(lambda x: x.text, soup.select("#StatusGrid thead tr")))

Output

['1', '2', '3']

CodePudding user response：

Use `.stripped_strings`

From the Docs

.stripped_strings yields Python strings that have had whitespace stripped.

Since it returns a generator you can convert it to a list to have an array of strings.

Here is how to use it.

from bs4 import BeautifulSoup as Soup
html='''
<table id="StatusGrid">
    <thead>
        <tr><td>Heading-1</td></tr>
        <tr><td>Heading-2</td></tr>
        <tr><td>Heading-3</td></tr>
    </thead>
    <tbody>
    </tbody>
</table>
'''

soup = Soup(html, features="lxml")
t = soup.find('table', {'id': 'StatusGrid'}).find('thead')
print(list(t.stripped_strings))

['Heading-1', 'Heading-2', 'Heading-3']

Example

Output

Use .stripped_strings

Use `.stripped_strings`