Home > other >  How to traverse imdb movie reviews?
How to traverse imdb movie reviews?

Time:10-28

I want to download some movie reviews from imdb so that I cant use those reviews for my LDA model. (for my school)

But default website for reviews contains only 25 reviews (e. g. https://www.imdb.com/title/tt0111161/reviews/?ref_=tt_ql_urv) If I want more I need to press "Load more" button at the bottom of website, which gives me 25 more reviews.

I don't know how to automate this in python, I can't go to *https://www.imdb.com/title/tt0111161/reviews/?ref_=tt_ql_urv*```/2``` or add parameter ?page=2

How to automate traversing pages at imdb review site using python?

Also, is this deliberately made so difficult?

CodePudding user response:

When I click Load More then in DevTools in Crome/Firefox (tab: Network, filter: XHR) I see link like

https://www.imdb.com/title/tt0111161/reviews/_ajax?ref_=undefined&paginationKey=g4xolermtiqhejcxxxgs753i36t52q343mpt34pjada6qpye4w6qtalmfyy7wfxcwfzuwsyh

and it has paginationKey=g4x...

and I see something similar in HTML <div ... data-key="g4x..." - so using this data-key I create link to get next page.


Example code.

First I get HTML from normal URL, and I get titles from reviews. Next I get data-key and create URL to get new reviews. I repeate it in for-loop to get 3 pages but you could use while True loop and repeate it if there is still data-key.

import requests
from bs4 import BeautifulSoup 

s = requests.Session()
#s.headers['User-Agent'] = 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:93.0) Gecko/20100101 Firefox/93.0'

# get first/full page

url = 'https://www.imdb.com/title/tt0111161/reviews/?ref_=tt_ql_urv'

r = s.get(url)
soup = BeautifulSoup(r.text, 'html.parser')

items = soup.find_all('a', {'class': 'title'})
for number, title in enumerate(items, 1):
    print(number, '>', title.text.strip())
    
# get next page(s)

for _ in range(3):

    div = soup.find('div', {'data-key': True})
    print('---', div['data-key'], '---')

    url = 'https://www.imdb.com/title/tt0111161/reviews/_ajax'
    
    payload = {
        'ref_': 'tt_ql_urv',
        'paginationKey': div['data-key']
    }

    #headers = {'X-Requested-With': 'XMLHttpRequest'}
    
    r = s.get(url, params=payload) #, headers=headers)
    soup = BeautifulSoup(r.text, 'html.parser')
    
    items = soup.find_all('a', {'class': 'title'})
    for number, title in enumerate(items, 1):
        print(number, '>', title.text.strip())
    

Result:

1 > Enthralling, fantastic, intriguing, truly remarkable!
2 > "I Had To Go To Prison To Learn To Be A Crook"
3 > Masterpiece
4 > All-time prison film classic
5 > Freeman gives it depth
6 > impressive
7 > Simply a great story that is moving and uplifting
8 > An incredible movie. One that lives with you.
9 > "I'm a convicted murderer who provides sound financial planning".
10 > IMDb and the Greatest Film of All Time
11 > never give up hope
12 > The Shawshank Redemption
13 > Brutal Anti-Bible Bigotry Prevails Again
14 > Time and Pressure.
15 > A classic
16 > An extraordinary and unforgettable film about a bank veep who is convicted of murders and sentenced to the toughest prison
17 > A genre picture, but a satisfying one...
18 > Why it is ranked so highly.
19 > Exceptional
20 > Shawshank Redemption- Prison Film is Redeemed by Quality ****
21 > A Classic Film On Hope And Redemption
22 > Compelling masterpiece
23 > Relentless Storytelling
24 > Some birds aren't meant to be caged.
25 > Good , But It Is Overrated By Some
--- g4xolermtiqhejcxxxgs753i36t52q343mpt34pjada6qpye4w6qtalmfyy7wfxcwfzuwsyh ---
1 > Stephen King's prison tale with a happy ending...
2 > Breaking Big Rocks Into Little Rocks
3 > Over Rated
4 > Terrific stuff!
5 > Highly Overrated But Still Good
6 > Superb
7 > Beautiful movie
8 > Tedious, overlong, with "hope" being the second word spoken in just about every sentence... who cares?
9 > Excellent Stephen King adaptation; flawless Robbins & Freeman
10 > Good for the spirit
11 > Entertaining Prison Movie Isn't Nearly as Good as Its Rabid Fan Base Would Lead You to Believe
12 > Observations...
13 > Why can't they make films like this anymore?
14 > Shawshank Redemption Comes Out Clean
15 > Hope Springs Eternal:Rita Hayworth And The Shawshank Redemption.
16 > Redeeming.
17 > You don't understand! I'm not supposed to be here!
18 > A Story Of Hope & Resilence
19 > Salvation lies within....
20 > Pretty good movie...one of those that you do not really need to watch from beginning to end.
21 > A film of Eloquence
22 > A great film of a helping hand leading to end-around justice
23 > about freedom
24 > Reputation notwithstanding, this is powerful stuff
25 > The best film ever made!
--- g4uorermtiqhejcxxxgs753i36t52q343eod34plapeoqp27z6b2lhdevccn5wyrz2vmgufh ---
1 > A Sort of Secular Redemption
2 > In virus times, we need this hope.
3 > The placement isn't an exaggeration
4 > A true story of friendship and hard times
5 > Escape from Shawshank
6 > Great Story Telling
7 > moving and emotionally impactful(if you liked "The Green Mile" you will like this movie)
8 > Super Good - Highly Recommended
9 > I can see why this is rated Number 1 on IMDb.

# ...
  • Related