Home > Net >  web scraper returns wrong data
web scraper returns wrong data

Time:07-31

import requests
from bs4 import BeautifulSoup
Year = input("What year would you like to travel to? YYY-MM-DD ")

URL = "https://www.billboard.com/charts/hot-100/"
URL  = URL   Year
response = requests.get(URL)
data = response.text


soup = BeautifulSoup(data,"html.parser")
songs = soup.find_all(name='h3', id="title-of-a-story")

all_songs = [song.getText() for song in songs]
print(all_songs)

I'm new to web scraping , Its supposed to give me the list of songs in the top 100 on the year that I specify but why is it giving me news,Its giving me the wrong data

CodePudding user response:

Try printing URL before making a request:

https://www.billboard.com/charts/hot-100/https://www.billboard.com/charts/hot-100/2022-01-01

That's clearly wrong, you got the base part twice. The line URL = URL Year is the culprit, it should have been URL = URL Year.

CodePudding user response:

adding to what Sasszem@ mentioned above

import requests
from bs4 import BeautifulSoup
Year = input("What year would you like to travel to? YYYY-MM-DD ")

URL = "https://www.billboard.com/charts/hot-100/"
URL = URL   Year

response = requests.get(URL)
data = response.text

songs = []
soup = BeautifulSoup(data,"html.parser")

# instead of directly jumping to the element, I found the container element first to restrict the code to a specific section of the website
container = soup.find_all(class_='lrv-a-unstyle-list lrv-u-flex lrv-u-height-100p lrv-u-flex-direction-column@mobile-max')

for x in container:
    song = x.find(id="title-of-a-story") #locating the element that contains text in that specific 'container'

    songs.append(song)

all_songs = [song.getText() for song in songs] #getting all the songs title in a list
print(all_songs) # ['\n\n\t\n\t\n\t\t\n\t\t\t\t\tAll I Want For Christmas Is You\t\t\n\t\n'] there is a weird prefix and suffix of stings with every title

#removing the suffix and prefix strings
final_output=[]
for i in all_songs:
    final_output.append(i[14:-5])

print(final_output)
  • Related