I'm very new to web scrapping in python. I want to extract the movie name, release year, and ratings from the IMDB database. This is the website for IMBD with 250 movies and ratings https://www.imdb.com/chart/moviemeter/?ref_=nv_mv_mpm.I use the module, BeautifulSoup, and request. Here is my code
movies = bs.find('tbody',class_='lister-list').find_all('tr')
When I tried to extract the movie name, rating & year, I got the same attribute error for all of them.
<td >
<a href="/title/tt11564570/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=ea4e08e1-c8a3-47b5-ac3a-75026647c16e&pf_rd_r=BQWZRBFAM81S7K6ZBPJP&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=moviemeter&ref_=chtmvm_tt_1" title="Rian Johnson (dir.), Daniel Craig, Edward Norton">Glass Onion: une histoire à couteaux tirés</a>
<span >(2022)</span>
<div >1
<span >(
<span ></span>
1)</span>
<td >
<strong title="7,3 based on 207 962 user ratings">7,3</strong>strong text
title = movies.find('td',class_='titleColumn').a.text
rating = movies.find('td',class_='ratingColumn imdbRating').strong.text
year = movies.find('td',class_='titleColumn').span.text.strip('()')
AttributeError Traceback (most recent call last) <ipython-input-9-2363bafd916b> in <module> ----> 1 title = movies.find('td',class_='titleColumn').a.text 2 title
~\anaconda3\lib\site-packages\bs4\element.py in getattr(self, key) 2287 def getattr(self, key): 2288 """Raise a helpful exception to explain a common code fix.""" -> 2289 raise AttributeError( 2290 "ResultSet object has no attribute '%s'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?" % key 2291 )
AttributeError: ResultSet object has no attribute 'find'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?
Can someone help me to solve the problem? Thanks in advance!
CodePudding user response:
To get the ResultSets as list, you can try the next example.
from bs4 import BeautifulSoup
import requests
import pandas as pd
data = []
res = requests.get("https://www.imdb.com/chart/moviemeter/?ref_=nv_mv_mpm.I")
#print(res)
soup = BeautifulSoup(res.content, "html.parser")
for card in soup.select('.chart.full-width tbody tr'):
data.append({
"title": card.select_one('.titleColumn a').get_text(strip=True),
"year": card.select_one('.titleColumn span').text,
'rating': card.select_one('td[]').get_text(strip=True)
})
df = pd.DataFrame(data)
print(df)
#df.to_csv('out.csv', index=False)
Output:
title year rating
0 Avatar: The Way of Water (2022) 7.9
1 Glass Onion (2022) 7.2
2 The Menu (2022) 7.3
3 White Noise (2022) 5.8
4 The Pale Blue Eye (2022) 6.7
.. ... ... ...
95 Zoolander (2001) 6.5
96 Once Upon a Time in Hollywood (2019) 7.6
97 The Lord of the Rings: The Fellowship of the Ring (2001) 8.8
98 New Year's Eve (2011) 5.6
99 Spider-Man: No Way Home (2021) 8.2
[100 rows x 3 columns]
Update: To extract data using find_all and find
method.
from bs4 import BeautifulSoup
import requests
import pandas as pd
headers = {'User-Agent':'Mozilla/5.0'}
data = []
res = requests.get("https://www.imdb.com/chart/moviemeter/?ref_=nv_mv_mpm.I")
#print(res)
soup = BeautifulSoup(res.content, "html.parser")
for card in soup.table.tbody.find_all("tr"):
data.append({
"title": card.find("td",class_="titleColumn").a.get_text(strip=True),
"year": card.find("td",class_="titleColumn").span.get_text(strip=True),
'rating': card.find('td',class_="ratingColumn imdbRating").get_text(strip=True)
})
df = pd.DataFrame(data)
print(df)
CodePudding user response:
AttributeError: ResultSet object has no attribute 'find'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?
find_all
returns an array, meaning that movies
is an array. You need to iterate over the array with for movie in movies:
for movie in movies:
title = movie.find('td',class_='titleColumn').a.text
rating = movie.find('td',class_='ratingColumn imdbRating').strong.text
year = movie.find('td',class_='titleColumn').span.text.strip('()')