Home > Net >  Trying to scrape Aliexpress product reviews with Beautifulsoup
Trying to scrape Aliexpress product reviews with Beautifulsoup

Time:08-19

I'm trying to collect aliexpress reviews from a product page e.g https://www.aliexpress.com/item/3256801798731854.html

I have written my code that will scrape this page & collect the reviews.

import requests
from bs4 import BeautifulSoup
from time import sleep

url = "https://www.aliexpress.com/item/3256801798731854.html"

response = requests.get(url).text

soup = BeautifulSoup(response, "html.parser")

reviews = soup.select("div.f-content dl.buyer-review dt.buyer-feedback")

for rev in reviews:
    rev_text = rev.find("span").text
    print(rev_text)
    sleep(1)

The problem is that when I try to run this code, I get nothing in my terminal which is crazy. I really don't understand why my reviews variable is returning an empty list because print(reviews) prints an empty list.

What's wrong with my select statement in Beautifulsoup.

I also don't understand why the code reviews1 = soup.select("div.f-content") doesn't work(prints an empty list) but reviews2 = soup.select("div", class_ = "f-content") works

I have seen this problem with several of my work & I don't understand why reviews1 doesn't work yet it's supposed to be working.

But generally, I would like to kindly get some guidance on my code so that I can be able to collect reviews from any Aliexpress product page.

CodePudding user response:

Main issue here is that the reviews are rendered in an iframe so your url is another.

Check also this answer, that deals with a selenium solution.

Example

import requests
from bs4 import BeautifulSoup
from time import sleep

url = "https://feedback.aliexpress.com/display/productEvaluation.htm?v=2&productId=1005001985046606&ownerMemberId=227141890&companyId=236776222&memberType=seller&startValidDate=&i18n=true"
response = requests.get(url).text
soup = BeautifulSoup(response)

reviews = soup.select("div.f-content dl.buyer-review dt.buyer-feedback")

for rev in reviews:
    rev_text = rev.find("span").text
    print(rev_text)
    sleep(1)

CodePudding user response:

Actually, The reviews data is generating dynamically from external source which is AJAX via API as POST method. So you have to use API url instead.

Example :

import requests
from bs4 import BeautifulSoup
import pandas as pd

api_url = "https://feedback.aliexpress.com/display/productEvaluation.htm"
headers={
    'content-type':'application/x-www-form-urlencoded'
}

payload = {
    'ownerMemberId': '227141890',
    'memberType': 'seller',
    'productId': '1005001985046606',
    'companyId':'',
    'evaStarFilterValue': 'all Stars',
    'evaSortValue': 'sortdefault@feedback',
    'page': '2',
    'currentPage': '1',
    'startValidDate':'',
    'i18n': 'true',
    'withPictures': 'false',
    'withAdditionalFeedback': 'false',
    'onlyFromMyCountry': 'false',
    'version':'',
    'isOpened': 'true',
    'translate': 'Y',
    'jumpToTop': 'true',
    'v': '2'
    }

res = requests.post(api_url,data=payload,headers=headers)
#print(res)
data =[]
for payload['page'] in range(1,26):
    soup = BeautifulSoup(res.text, "html.parser")

    reviews = soup.select("div.f-content dl.buyer-review dt.buyer-feedback")

    for rev in reviews:
        rev_text = rev.find("span").text
        data.append({
            'Reviews':rev_text
        })

df =pd.DataFrame(data)
print(df)
       
    

Output:

                         Reviews
0      Sent at once, it's for 18 days, all OK, I like.
1    Very useful, it is adjustable, it really makes...
2    Looks good on the material. Nice velcro pege. ...
3    I arrive in estimated time, good coordination ...
4    Excellent Product, arrived in 7 days to my cou...
..                                                 ...
245  good product, good price, work well. live it f...
246  In general, it is sewn well, but the threads a...
247  Like the description, yet will use to know if ...
248  Thing-awesome, take and do not think even, kee...
249                  Pas encore essayé, more semble as

[250 rows x 1 columns]

CodePudding user response:

Not an expert on beutifulsoup, but try to add ">" after the first part from your statements.

"div > .f-content" etc.

I think that should work.

soup.select documentation

  • Related