Home > Mobile >  Trouble retrieving artist name from billboard top 100 site using beautiful soup
Trouble retrieving artist name from billboard top 100 site using beautiful soup

Time:04-02

I am trying to retrieve the most popular songs from this url using the python package BeautifulSoup. When I go to grab the span with the artist name, it grabs the proper span, but when I call '.text' on the span it doesnt grab the text between the span tags.

Here is my code:

import requests
from bs4 import BeautifulSoup

r = requests.get('https://www.billboard.com/charts/hot-100/')
soup = BeautifulSoup(r.content, 'html.parser')
result = soup.find_all('div', class_='o-chart-results-list-row-container')
for res in result:
    songName = res.find('h3').text.strip()
    artist = res.find('span',class_='c-label a-no-trucate a-font-primary-s lrv-u-font-size-14@mobile-max u-line-height-normal@mobile-max u-letter-spacing-0021 lrv-u-display-block a-truncate-ellipsis-2line u-max-width-330 u-max-width-230@tablet-only').text
    print("song: " songName)
    print("artist: "  str(artist))
    print("___________________________________________________")

Which currently prints the following per song:

song: Waiting On A Miracle
artist: <span >

        Stephanie Beatriz
</span>
___________________________________________________

How do I pull only the artist's name?

CodePudding user response:

If there's one single character off in the class, it won't catch it. I'd just simplify it by once getting the song title, the artist follows in the next <span> tag. So get that <h3> tag like you do for the song, then use .find_next() to get the artist:

import requests
from bs4 import BeautifulSoup

r = requests.get('https://www.billboard.com/charts/hot-100/')
soup = BeautifulSoup(r.content, 'html.parser')
result = soup.find_all('div', class_='o-chart-results-list-row-container')
for res in result:
    songName = res.find('h3').text.strip()
    artist = res.find('h3').find_next('span').text.strip()
    print("song: " songName)
    print("artist: "  str(artist))
    print("___________________________________________________")

Output:

song: Heat Waves
artist: Glass Animals
___________________________________________________
song: Stay
artist: The Kid LAROI & Justin Bieber
___________________________________________________
song: Super Gremlin
artist: Kodak Black
___________________________________________________
song: abcdefu
artist: GAYLE
___________________________________________________
song: Ghost
artist: Justin Bieber
___________________________________________________
song: We Don't Talk About Bruno
artist: Carolina Gaitan, Mauro Castillo, Adassa, Rhenzy Feliz, Diane Guerrero, Stephanie Beatriz & Encanto Cast
___________________________________________________
song: Enemy
artist: Imagine Dragons X JID
___________________________________________________

....
  • Related