I want to scrape daily top 200 songs from Spotify charts website. I am trying to parse html code of page and trying to get song's artist, name and stream informations. But following code returns nothing. How can I get these informations with the following way?
for a in soup.find("div",{"class":"Container-c1ixcy-0 krZEp encore-base-set"}):
for b in a.findAll("main",{"class":"Main-tbtyrr-0 flXzSu"}):
for c in b.findAll("div",{"class":"Content-sc-1n5ckz4-0 jyvkLv"}):
for d in c.findAll("div",{"class":"TableContainer__Container-sc-86p3fa-0 fRKUEz"}):
print(d)
And let say this is the songs list that I want to scrape from it.
CodePudding user response:
none selenium solution:
import requests
import pandas as pd
url = 'https://charts-spotify-com-service.spotify.com/public/v0/charts'
response = requests.get(url)
chart = []
for entry in response.json()['chartEntryViewResponses'][0]['entries']:
chart.append({
"Rank": entry['chartEntryData']['currentRank'],
"Artist": ', '.join([artist['name'] for artist in entry['trackMetadata']['artists']]),
"TrackName": entry['trackMetadata']['trackName']
})
df = pd.DataFrame(chart)
print(df.to_string(index=False))
OUTPUT:
Rank Artist TrackName
1 Bizarrap,Quevedo Quevedo: Bzrp Music Sessions, Vol. 52
2 Harry Styles As It Was
3 Bad Bunny,Chencho Corleone Me Porto Bonito
4 Bad Bunny Tití Me Preguntó
5 Manuel Turizo La Bachata
6 ROSALÍA DESPECHÁ
7 BLACKPINK Pink Venom
8 David Guetta,Bebe Rexha I'm Good (Blue)
9 OneRepublic I Ain't Worried
10 Bad Bunny Efecto
11 Chris Brown Under The Influence
12 Steve Lacy Bad Habit
13 Bad Bunny,Bomba Estéreo Ojitos Lindos
14 Kate Bush Running Up That Hill (A Deal With God) - 2018 Remaster
15 Joji Glimpse of Us
16 Nicki Minaj Super Freaky Girl
17 Bad Bunny Moscow Mule
18 Rosa Linn SNAP
19 Glass Animals Heat Waves
20 KAROL G PROVENZA
21 Charlie Puth,Jung Kook,BTS Left and Right (Feat. Jung Kook of BTS)
22 Harry Styles Late Night Talking
23 The Kid LAROI,Justin Bieber STAY (with Justin Bieber)
24 Tom Odell Another Love
25 Central Cee Doja
26 Stephen Sanchez Until I Found You
27 Bad Bunny Neverita
28 Post Malone,Doja Cat I Like You (A Happier Song) (with Doja Cat)
29 Lizzo About Damn Time
30 Nicky Youre,dazy Sunroof
31 Elton John,Britney Spears Hold Me Closer
32 Luar La L Caile
33 KAROL G,Maldy GATÚBELA
34 The Weeknd Die For You
35 Bad Bunny,Jhay Cortez Tarot
36 James Hype,Miggy Dela Rosa Ferrari
37 Imagine Dragons Bones
38 Elton John,Dua Lipa,PNAU Cold Heart - PNAU Remix
39 The Neighbourhood Sweater Weather
40 Ghost Mary On A Cross
41 Shakira,Rauw Alejandro Te Felicito
42 Justin Bieber Ghost
43 Bad Bunny,Rauw Alejandro Party
44 Drake,21 Savage Jimmy Cooks (feat. 21 Savage)
45 Doja Cat Vegas (From the Original Motion Picture Soundtrack ELVIS)
46 Camila Cabello,Ed Sheeran Bam Bam (feat. Ed Sheeran)
47 Rauw Alejandro,Lyanno,Brray LOKERA
48 Rels B cómo dormiste?
49 The Weeknd Blinding Lights
50 Arctic Monkeys 505
CodePudding user response:
In the example link you provided, there aren't 200 songs, but only 50. The following is one way to get those songs:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.common.exceptions import NoSuchElementException, TimeoutException
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys
import time as t
import pandas as pd
from bs4 import BeautifulSoup
chrome_options = Options()
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument("window-size=1920,1080")
webdriver_service = Service("chromedriver/chromedriver") ## path to where you saved chromedriver binary
browser = webdriver.Chrome(service=webdriver_service, options=chrome_options)
url = 'https://charts.spotify.com/charts/view/regional-tr-daily/2022-09-14'
browser.get(url)
wait = WebDriverWait(browser, 5)
try:
wait.until(EC.element_to_be_clickable((By.ID, "onetrust-accept-btn-handler"))).click()
print("accepted cookies")
except Exception as e:
print('no cookie button')
header_to_be_removed = wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, 'header[data-testid="charts-header"]')))
browser.execute_script("""
var element = arguments[0];
element.parentNode.removeChild(element);
""", header_to_be_removed)
while True:
try:
show_more_button = wait.until(EC.element_to_be_clickable((By.XPATH, '//div[@data-testid="load-more-entries"]//button')))
show_more_button.location_once_scrolled_into_view
t.sleep(5)
show_more_button.click()
print('clicked to show more')
t.sleep(3)
except TimeoutException:
print('all done')
break
songs = wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, 'li[data-testid="charts-entry-item"]')))
print('we have', len(songs), 'songs')
song_list = []
for song in songs:
song.location_once_scrolled_into_view
t.sleep(1)
title = song.find_element(By.CSS_SELECTOR, 'p[class^="Type__TypeElement-"]')
artist = song.find_element(By.CSS_SELECTOR, 'span[data-testid="artists-names"]')
song_list.append((artist.text, title.text))
df = pd.DataFrame(song_list, columns = ['Title', 'Artist'])
print(df)
This will print out in terminal:
no cookie button
clicked to show more
clicked to show more
clicked to show more
clicked to show more
all done
we have 50 songs
Title | Artist | |
---|---|---|
0 | Bizarrap, | Quevedo: Bzrp Music Sessions, Vol. 52 |
1 | Harry Styles | As It Was |
2 | Bad Bunny, | Me Porto Bonito |
3 | Bad Bunny | Tití Me Preguntó |
4 | Manuel Turizo | La Bachata |
5 | ROSALÍA | DESPECHÁ |
6 | BLACKPINK | Pink Venom |
7 | David Guetta, | I'm Good (Blue) |
8 | OneRepublic | I Ain't Worried |
9 | Bad Bunny | Efecto |
10 | Chris Brown | Under The Influence |
11 | Steve Lacy | Bad Habit |
12 | Bad Bunny, | Ojitos Lindos |
13 | Kate Bush | Running Up That Hill (A Deal With God) - 2018 Remaster |
14 | Joji | Glimpse of Us |
15 | Nicki Minaj | Super Freaky Girl |
16 | Bad Bunny | Moscow Mule |
17 | Rosa Linn | SNAP |
18 | Glass Animals | Heat Waves |
19 | KAROL G | PROVENZA |
20 | Charlie Puth, | Left and Right (Feat. Jung Kook of BTS) |
21 | Harry Styles | Late Night Talking |
22 | The Kid LAROI, | STAY (with Justin Bieber) |
23 | Tom Odell | Another Love |
24 | Central Cee | Doja |
25 | Stephen Sanchez | Until I Found You |
26 | Bad Bunny | Neverita |
27 | Post Malone, | I Like You (A Happier Song) (with Doja Cat) |
28 | Lizzo | About Damn Time |
29 | Nicky Youre, | Sunroof |
30 | Elton John, | Hold Me Closer |
31 | Luar La L | Caile |
32 | KAROL G, | GATÚBELA |
33 | The Weeknd | Die For You |
34 | Bad Bunny, | Tarot |
35 | James Hype, | Ferrari |
36 | Imagine Dragons | Bones |
37 | Elton John, | Cold Heart - PNAU Remix |
38 | The Neighbourhood | Sweater Weather |
39 | Ghost | Mary On A Cross |
40 | Shakira, | Te Felicito |
41 | Justin Bieber | Ghost |
42 | Bad Bunny, | Party |
43 | Drake, | Jimmy Cooks (feat. 21 Savage) |
44 | Doja Cat | Vegas (From the Original Motion Picture Soundtrack ELVIS) |
45 | Camila Cabello, | Bam Bam (feat. Ed Sheeran) |
46 | Rauw Alejandro, | LOKERA |
47 | Rels B | cómo dormiste? |
48 | The Weeknd | Blinding Lights |
49 | Arctic Monkeys | 505 |
Of course you can get other info like chart ranking, all artists when there are more than one, etc.
Selenium chrome/chromedriver setup is for Linux, you just have to observe the imports and code after defining the browser, to adapt it to your own setup.
Pandas documentation: https://pandas.pydata.org/pandas-docs/stable/index.html
For selenium docs, visit: https://www.selenium.dev/documentation/