Home > other >  Trouble scraping all the follower names from a profile page using requests
Trouble scraping all the follower names from a profile page using requests

Time:11-05

I'm trying to scrape all the follower names from a profile page using requests module. The problem is when I run the script below, I get the first 20 names over and over again.

The parameters used in post requests only have two keys and values like size:20 and continuation:timestamp. I tried to use the parameters in the right way but still I get the same results repeatedly.

import time
import requests

link = 'https://api-mainnet.rarible.com/marketplace/api/v4/followers'

params = {'user': '0xe744d23107c9c98df5311ff8c1c8637ec3ecf9f3'}
payload = {"size": 20}

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36',
    'origin': 'https://rarible.com',
    'referer': 'https://rarible.com/'
}

with requests.Session() as s:
    s.headers.update(headers)
    
    while True:
        res = s.post(link,params=params,json=payload)
        print(s.headers)
        for item in res.json():
            print(item['owner'].get('name',''))

        payload['continuation'] = f"{int(time.time() * 1000)}"
        time.sleep(2)

How can I parse all the follower names from that page using requests?

CodePudding user response:

some api may block you from extracting values more than certain limit and also may show in pages with limits. For me just increasing the size payload worked with your code.

import time
import requests

link = 'https://api-mainnet.rarible.com/marketplace/api/v4/followers'

params = {'user': '0xe744d23107c9c98df5311ff8c1c8637ec3ecf9f3'}
payload = {"size": 10000}

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36',
    'origin': 'https://rarible.com',
    'referer': 'https://rarible.com/'
}

with requests.Session() as s:
    s.headers.update(headers)

    res = s.post(link,params=params,json=payload)
    print(len(res.json()))
    for item in res.json():
        print(item['owner'].get('name',''))
  • Related