How to extract texts from elements selected by class?-CodePudding

I am trying to extract data from a website using BeautifulSoup.

The website data reads:

<div content-43 >This is the text I want to grab</div>

I am currently using:

 item_store = soup.find_all("div",{"class":"item-name"})

However it returns the entire line of HTML like the div tags instead of just the text I want.

CodePudding user response：

You have to use .get_text() to extract texts and not the element - be aware that if you have to iterate the ResultSet of find_all() before you can call the methode.

With find() on a single element:

soup.find("div",{"class":"item-name"}).get_text()

With find_all() on a ResultSet:

[e.get_text() for e in soup.find_all("div",{"class":"item-name"})]

While using select() and css selectors also on a ResultSet:

[e.get_text() for e in soup.select('div.item-name')]

Example

from bs4 import BeautifulSoup

html = '''
<div content-43 >This is the text I grab with find() and also with find_all()</div>
<div content-43 >This is the text I want to grab with find_all() </div>
'''
soup = BeautifulSoup(html)

print(soup.find("div",{"class":"item-name"}).get_text())
print([e.get_text() for e in soup.find_all("div",{"class":"item-name"})])

Output

This is the text I grab with find() and also with find_all()

and

['This is the text I grab with find() and also with find_all()',
 'This is the text I want to grab with find_all() ']

CodePudding user response：

You should use .get_text() methood or text property
you can print them like this

for item in item_store:
    print(item.text)
    # print(item.get_text())