I'm trying to collect aliexpress reviews from a product page e.g https://www.aliexpress.com/item/3256801798731854.html
I have written my code that will scrape this page & collect the reviews.
import requests
from bs4 import BeautifulSoup
from time import sleep
url = "https://www.aliexpress.com/item/3256801798731854.html"
response = requests.get(url).text
soup = BeautifulSoup(response, "html.parser")
reviews = soup.select("div.f-content dl.buyer-review dt.buyer-feedback")
for rev in reviews:
rev_text = rev.find("span").text
print(rev_text)
sleep(1)
The problem is that when I try to run this code, I get nothing in my terminal which is crazy.
I really don't understand why my reviews variable is returning an empty list because print(reviews)
prints an empty list.
What's wrong with my select statement in Beautifulsoup.
I also don't understand why the code reviews1 = soup.select("div.f-content")
doesn't work(prints an empty list) but reviews2 = soup.select("div", class_ = "f-content")
works
I have seen this problem with several of my work & I don't understand why reviews1 doesn't work yet it's supposed to be working.
But generally, I would like to kindly get some guidance on my code so that I can be able to collect reviews from any Aliexpress product page.
CodePudding user response:
Main issue here is that the reviews are rendered in an iframe
so your url is another.
Check also this answer, that deals with a selenium solution.
Example
import requests
from bs4 import BeautifulSoup
from time import sleep
url = "https://feedback.aliexpress.com/display/productEvaluation.htm?v=2&productId=1005001985046606&ownerMemberId=227141890&companyId=236776222&memberType=seller&startValidDate=&i18n=true"
response = requests.get(url).text
soup = BeautifulSoup(response)
reviews = soup.select("div.f-content dl.buyer-review dt.buyer-feedback")
for rev in reviews:
rev_text = rev.find("span").text
print(rev_text)
sleep(1)
CodePudding user response:
Actually, The reviews data is generating dynamically from external source which is AJAX
via API as POST method. So you have to use API url instead.
Example :
import requests
from bs4 import BeautifulSoup
import pandas as pd
api_url = "https://feedback.aliexpress.com/display/productEvaluation.htm"
headers={
'content-type':'application/x-www-form-urlencoded'
}
payload = {
'ownerMemberId': '227141890',
'memberType': 'seller',
'productId': '1005001985046606',
'companyId':'',
'evaStarFilterValue': 'all Stars',
'evaSortValue': 'sortdefault@feedback',
'page': '2',
'currentPage': '1',
'startValidDate':'',
'i18n': 'true',
'withPictures': 'false',
'withAdditionalFeedback': 'false',
'onlyFromMyCountry': 'false',
'version':'',
'isOpened': 'true',
'translate': 'Y',
'jumpToTop': 'true',
'v': '2'
}
res = requests.post(api_url,data=payload,headers=headers)
#print(res)
data =[]
for payload['page'] in range(1,26):
soup = BeautifulSoup(res.text, "html.parser")
reviews = soup.select("div.f-content dl.buyer-review dt.buyer-feedback")
for rev in reviews:
rev_text = rev.find("span").text
data.append({
'Reviews':rev_text
})
df =pd.DataFrame(data)
print(df)
Output:
Reviews
0 Sent at once, it's for 18 days, all OK, I like.
1 Very useful, it is adjustable, it really makes...
2 Looks good on the material. Nice velcro pege. ...
3 I arrive in estimated time, good coordination ...
4 Excellent Product, arrived in 7 days to my cou...
.. ...
245 good product, good price, work well. live it f...
246 In general, it is sewn well, but the threads a...
247 Like the description, yet will use to know if ...
248 Thing-awesome, take and do not think even, kee...
249 Pas encore essayé, more semble as
[250 rows x 1 columns]
CodePudding user response:
Not an expert on beutifulsoup, but try to add ">" after the first part from your statements.
"div > .f-content" etc.
I think that should work.