Selenium - clicking page number changes page but does not reload/populate data-CodePudding

I'm trying to scrape usercomments (see disclaimer below). The comments are organized with the following pagination

Im getting the different numbered elements and just clicking on the next button >. The page does change, but the new data does not populate and it looks like this

Here is a short excerpt of the code:

from selenium import webdriver
from selenium.webdriver.common.by import By
import time
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

DRIVER_PATH = '***/chromedriver.exe'
driver = webdriver.Chrome(executable_path=DRIVER_PATH)  # depreciation, update!

URL = "https://www.kbb.com/mercedes-benz/cla/2018/consumer-reviews/"
driver.get(URL)
time.sleep(5)
button = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, '//button[@]')))
button.click()

WebDriverWait(driver, 50)
   
# driver.close()

What can I do to make the fields reload properly? I appreciate all the info I can get :- )

Disclaimer: This is a first test for a research project, there will be no illegal scraping without permission or any missuse of data!

CodePudding user response：

The page/data is rendered dynamically. You can get the data through the api and iterate through the pages parameter. You can also, just adjust the number per page and get it within 1 request (provided there are 100 or less reviews).

import requests
import pandas as pd

url = 'https://www.kbb.com/ymm/api/'
payload = {
    "operationName":"consumerReviewsQuery",
    "variables":{
        "year":"2018",
        "make":"mercedes-benz",
        "model":"cla",
        "page":1,
        "perPage":100,
        "bodystyle":"Sedan",
        "sort":"1",
        "filter":"",
        "trendingTopic":""
        },
    "query":"query consumerReviewsQuery($year: String, $make: String!, $model: String!, $page: Int!, $perPage: Int!, $isInitialLoad: Boolean, $priceType: String, $bodystyle: String, $vehicleId: String, $trim: String, $sort: String, $trendingTopic: String, $filter: String) {\n  consumerreviews(\n    year: $year\n    make: $make\n    model: $model\n    page: $page\n    perPage: $perPage\n    isInitialLoad: $isInitialLoad\n    priceType: $priceType\n    bodystyle: $bodystyle\n    vehicleId: $vehicleId\n    trim: $trim\n    sort: $sort\n    trendingTopic: $trendingTopic\n    filter: $filter\n  ) {\n    numPages\n    totalReviews\n    reviews {\n      id\n      nickname\n      nicknameDisplay\n      location\n      anonymous\n      email\n      sessionId\n      visitorId\n      sessionCount\n      friendlyOwnershipStatus\n      year\n      model\n      make\n      vehicleId\n      title\n      reviewText\n      ratingOverall\n      ratingValue\n      ratingReliability\n      ratingPerformance\n      ratingStyling\n      ratingComfort\n      ratingQuality\n      submissionDate\n      positiveLink\n      negativeLink\n      numPositiveFeedbacks\n      numNegativeFeedbacks\n      numFeedbacks\n      pros\n      cons\n      areProsOrConsAvailable\n      __typename\n    }\n    searchTerms\n    __typename\n  }\n}"}

jsonData = requests.post(url, json=payload).json()


reviews = pd.DataFrame(jsonData['data']['consumerreviews']['reviews'])

Output:

print(reviews)
           id             nickname  ... areProsOrConsAvailable __typename
0   187159459              Love it  ...                   True    Reviews
1   179266834               Cremur  ...                   True    Reviews
2   176067479                ELSIE  ...                  False    Reviews
3   172175820               Noemia  ...                   True    Reviews
4   163968274                Pmaze  ...                   True    Reviews
5   158405420                 Gary  ...                   True    Reviews
6   143025966                PMAZE  ...                   True    Reviews
7   139966209              Frenchy  ...                   True    Reviews
8   139766083           Arizona RN  ...                   True    Reviews
9   131870778                   GW  ...                   True    Reviews
10  120024401               Deekay  ...                   True    Reviews
11  119822871                 Tony  ...                   True    Reviews
12  116958004                MBPDX  ...                   True    Reviews
13  115487407             Smitty96  ...                   True    Reviews
14  110965961             chhappy7  ...                   True    Reviews
15  109184667             Tampafun  ...                   True    Reviews
16  101289834                Neile  ...                   True    Reviews
17   84350718               George  ...                   True    Reviews
18   75845132                  dav  ...                   True    Reviews
19   72639833                 Doug  ...                   True    Reviews
20   69174734               Carnut  ...                   True    Reviews
21   67191860                 Mark  ...                   True    Reviews
22   65876085                 bill  ...                  False    Reviews
23   64211472               Lazlow  ...                   True    Reviews
24   64008710                psyco  ...                   True    Reviews
25   57576670             vars0153  ...                  False    Reviews
26   57574924             Fernando  ...                  False    Reviews
27   50932030            anauditor  ...                   True    Reviews
28   50346331           Missct1964  ...                  False    Reviews
29   48468674               tekfoc  ...                   True    Reviews
30   48003934            BrwnJewel  ...                  False    Reviews
31   47955889               Free88  ...                   True    Reviews
32   47726965                 Josh  ...                   True    Reviews
33   47503009                Derek  ...                   True    Reviews
34   44513353                Don Z  ...                   True    Reviews
35   43143964               Raquel  ...                   True    Reviews
36   43142690            Pajama168  ...                   True    Reviews
37   40484198                   JJ  ...                   True    Reviews
38   39226477              fox4gib  ...                   True    Reviews
39   38915453     Happy in Chicago  ...                   True    Reviews
40   38485354            CLA owner  ...                   True    Reviews
41   35530044    1st time MB owner  ...                   True    Reviews
42   34931432                   CC  ...                   True    Reviews
43   34151324  First time MB buyer  ...                   True    Reviews
44   33259903                  tom  ...                   True    Reviews
45   32943654                 Yash  ...                   True    Reviews
46   32472645     TheMarcoIslander  ...                   True    Reviews

[47 rows x 33 columns]

CodePudding user response：

I don't see any such major issue in your code block. However, the classnames like ehp7fkv0 are dynamic in nature and is bound to change everytime you access the webapplication afresh. A canonical approach would be to avoid the dynamic values and fall back on static attribute values.

To click() on the clickable element you need to induce WebDriverWait for the element_to_be_clickable() and you can use either of the following locator strategies:

Using CSS_SELECTOR:

driver.get('https://www.kbb.com/mercedes-benz/cla/2018/consumer-reviews/')
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button[aria-label='go to previouse page']"))).click()

Using XPATH:

driver.get('https://www.kbb.com/mercedes-benz/cla/2018/consumer-reviews/')    
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//button[@aria-label='go to previouse page']"))).click()

Note: You have to add the following imports :

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC