Home > other >  Beautiful soup- trouble removing HTML tags
Beautiful soup- trouble removing HTML tags

Time:02-19

I am new to coding but that will become obvious. I'm trying to extract some text from a webpage.

am = requests.get(url)

soup = BeautifulSoup(am.content, 'html.parser')

songs = soup.findAll("div", {"class": "songs-list-row__song-name"}, text=True)[0].string

There are 4 instances of text that I want to isolate on this web page. This code only outputs one at a time, depending on the number I enter in the [].string bracket. How do I output it as a list, with all four instances?

Thanks.

-I

CodePudding user response:

Just iterate over your ResultSet for example with list comprehension:

songs = [e.text for e in soup.find_all("div", {"class": "songs-list-row__song-name"}, text=True)]

Note: In newer code use find_all() instead of old syntax findAll()

Example

from bs4 import BeautifulSoup

html='''
<div >song 1</div>
<div >song 2</div>
<div >song 3</div>
<div >song 4</div>
'''

soup = BeautifulSoup(html, 'html.parser')

songs = [e.text for e in soup.find_all("div", {"class": "songs-list-row__song-name"}, text=True)]

print(songs)

Output

['song 1', 'song 2', 'song 3', 'song 4']
  • Related