Cross-border electric company, I do want to get a ranking of all the goods by the python code (ASIN), please forgive me for python is a small white, see have preliminary code at the end of the article, but the problem are as follows:
1, in fact, the format of the urls are regular (actually only two changes: pg_1, pg=1), how to add a traversal statements, and don't write two links in urls
On page 2, the two often can only crawl into 1 page or can not catch on page 1, an error for the set ()
3, grab the data I want to export to a CSV file
CSV table a1 cell for "ASIN"
A2 to a101 cell respectively to ASIN
Thank you very much for the teacher is willing to help
The import requests
The import re
Urls=[
'https://www.amazon.com/gp/movers-and-shakers/automotive/ref=zg_bsms_pg_1? Ie=UTF8 & amp; Pg=1 ',
'https://www.amazon.com/gp/movers-and-shakers/automotive/ref=zg_bsms_pg_2? Ie=UTF8 & amp; Pg=2 '
]
For the url in urls:
The content=requests. Get (url). The content
Decoded_content=content. decode ()
Asins=set (re. The.findall (r '//^/+/dp/([?] ^" +) ', decoded_content))
Print (asins)
CodePudding user response:
Somehow you also cheat the browser, with a local computer information, you have an amazon mechanism
CodePudding user response: