I am trying to make web scraper using Python and the basic concept I am using here is,
create empty list --> use 'for loop' to loop through the element on the web page. --> append that info in the empty list --> convert that list to row and column using pandas --> finally to a csv.
the code that I made is
import requests
from requests import get
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np
headers = {"Accept-Language": "en-US, en;q=0.5"}
url = "https://www.imdb.com/find?q=top 1000 movies&ref_=nv_sr_sm"
results=requests.get(url,headers=headers)
soup=BeautifulSoup(results.text,"html.parser")
# print(soup.prettify())
#initializing empty lists where the data will go
titles =[]
years = []
times = []
imdb_rating = []
metascores = []
votes = []
us_gross = []
movie_div = soup.find_all('div',class_='lister-list')
#initiating the loop for scraper
for container in movie_div:
#tiles
name=container.tr.td.a.text
titles.append(name)
print(titles)
the website I want to scrap is 'https://www.imdb.com/chart/top/?ref_=nv_mv_250'. I need help to know how can i give correct path to the variable 'name', so that i can extract the name of the movie given in name_of_movei, in the HTML script of the page. Because each time I am getting output as empty list.
CodePudding user response:
This example will parse name
, year
, rating
from the table and creates a dataframe from it:
import requests
import pandas as pd
from bs4 import BeautifulSoup
url = "https://www.imdb.com/chart/top/"
headers = {"Accept-Language": "en-US, en;q=0.5"}
soup = BeautifulSoup(requests.get(url, headers=headers).content, "html.parser")
all_data = []
for row in soup.select(".lister-list > tr"):
name = row.select_one(".titleColumn a").text.strip()
year = row.select_one(".titleColumn .secondaryInfo").text.strip()
rating = row.select_one(".imdbRating").text.strip()
# ...other variables
all_data.append([name, year, rating])
df = pd.DataFrame(all_data, columns=["Name", "Year", "Rating"])
print(df.head().to_markdown(index=False))
Prints:
Name | Year | Rating |
---|---|---|
The Shawshank Redemption | (1994) | 9.2 |
The Godfather | (1972) | 9.2 |
The Dark Knight | (2008) | 9 |
The Godfather: Part II | (1974) | 9 |
12 Angry Men | (1957) | 8.9 |