I am trying to extract data from a website using BeautifulSoup
.
The website data reads:
<div content-43 >This is the text I want to grab</div>
I am currently using:
item_store = soup.find_all("div",{"class":"item-name"})
However it returns the entire line of HTML like the div tags instead of just the text I want.
CodePudding user response:
You have to use .get_text()
to extract texts and not the element - be aware that if you have to iterate the ResultSet
of find_all()
before you can call the methode.
With find()
on a single element:
soup.find("div",{"class":"item-name"}).get_text()
With find_all()
on a ResultSet
:
[e.get_text() for e in soup.find_all("div",{"class":"item-name"})]
While using select()
and css selectors
also on a ResultSet
:
[e.get_text() for e in soup.select('div.item-name')]
Example
from bs4 import BeautifulSoup
html = '''
<div content-43 >This is the text I grab with find() and also with find_all()</div>
<div content-43 >This is the text I want to grab with find_all() </div>
'''
soup = BeautifulSoup(html)
print(soup.find("div",{"class":"item-name"}).get_text())
print([e.get_text() for e in soup.find_all("div",{"class":"item-name"})])
Output
This is the text I grab with find() and also with find_all()
and
['This is the text I grab with find() and also with find_all()',
'This is the text I want to grab with find_all() ']
CodePudding user response:
You should use .get_text()
methood or text
property
you can print them like this
for item in item_store:
print(item.text)
# print(item.get_text())