import requests
from bs4 import BeautifulSoup
Year = input("What year would you like to travel to? YYY-MM-DD ")
URL = "https://www.billboard.com/charts/hot-100/"
URL = URL Year
response = requests.get(URL)
data = response.text
soup = BeautifulSoup(data,"html.parser")
songs = soup.find_all(name='h3', id="title-of-a-story")
all_songs = [song.getText() for song in songs]
print(all_songs)
I'm new to web scraping , Its supposed to give me the list of songs in the top 100 on the year that I specify but why is it giving me news,Its giving me the wrong data
CodePudding user response:
Try printing URL
before making a request:
https://www.billboard.com/charts/hot-100/https://www.billboard.com/charts/hot-100/2022-01-01
That's clearly wrong, you got the base part twice. The line URL = URL Year
is the culprit, it should have been URL = URL Year
.
CodePudding user response:
adding to what Sasszem@ mentioned above
import requests
from bs4 import BeautifulSoup
Year = input("What year would you like to travel to? YYYY-MM-DD ")
URL = "https://www.billboard.com/charts/hot-100/"
URL = URL Year
response = requests.get(URL)
data = response.text
songs = []
soup = BeautifulSoup(data,"html.parser")
# instead of directly jumping to the element, I found the container element first to restrict the code to a specific section of the website
container = soup.find_all(class_='lrv-a-unstyle-list lrv-u-flex lrv-u-height-100p lrv-u-flex-direction-column@mobile-max')
for x in container:
song = x.find(id="title-of-a-story") #locating the element that contains text in that specific 'container'
songs.append(song)
all_songs = [song.getText() for song in songs] #getting all the songs title in a list
print(all_songs) # ['\n\n\t\n\t\n\t\t\n\t\t\t\t\tAll I Want For Christmas Is You\t\t\n\t\n'] there is a weird prefix and suffix of stings with every title
#removing the suffix and prefix strings
final_output=[]
for i in all_songs:
final_output.append(i[14:-5])
print(final_output)