I am trying to scrape the usernames from the "Followers" button list on this profile, using Python Selenium. I am not able to do this for 2 reasons:
- I cant scroll on the list by using
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
because the list has 2 scrollbars (I don't know why it has 2). If I try to scroll it scrolls the profile page and not the actual list. - Even if I manage to scroll the list, how am I supposed to store the usernames? The users are dynamically loaded and for some reason the class id looks like this
class='st--c-PJLV st--c-dhzjXW st--c-edagZx'
I've tried several ways of solving this but I'm not able to achieve the result I want, any help is appreciated. Here are some code snippets I tried to use but instead got an error:
scrollElem = driver.find_elements(By.XPATH, "//div[@class='st--c-PJLV st--c-dhzjXW st--c-
edagZx']/a")
followernumber = 2000
scrollElem[len(scrollElem)-1].location_once_scrolled_into_view
for i in range(0,followernumber):
new = len(scrollElem) i
newname = driver.find_element(By.XPATH, "(//div[@class='st--c-PJLV st--c-dhzjXWstedagZx']/a)[%i]"%new)
print(newname.text, i)
newname.location_once_scrolled_into_view
time.sleep(1)
Got the error:selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"(//div[@class='st--c-PJLV st--c-dhzjXW st--c-edagZx']/a)[47]"}
I also tried scrolling at the bottom of the list using this algorithm and store the elements while they load but that didn't work either:
def scrollDown():
last_height = driver.execute_script("return document.body.scrollHeight")
while True:
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(SCROLL_PAUSE_TIME)
new_height = driver.execute_script("return document.body.scrollHeight")
if new_height == last_height:
break
last_height = new_height
The algorithm scrolled the profile page and not the list of followers
I would appreciate any help as I'm new to web-scraping!
CodePudding user response:
Try using requests module to get all the follower names of that profile:
import requests
link = 'https://hasura2.foundation.app/v1/graphql'
payload = {"query":"query userFollowersQuery($publicKey: String!, $currentUserPublicKey: String!, $offset: Int!, $limit: Int!) {\n follows: follow(\n where: {followedUser: {_eq: $publicKey}, isFollowing: {_eq: true}}\n offset: $offset\n limit: $limit\n ) {\n id\n user: userByFollowingUser {\n name\n username\n profileImageUrl\n userIndex\n publicKey\n follows(where: {user: {_eq: $currentUserPublicKey}, isFollowing: {_eq: true}}) {\n createdAt\n isFollowing\n }\n }\n }\n}\n","variables":{"currentUserPublicKey":"","publicKey":"0xF74d1224931AFa9cf12D06092c1eb1818D1E255C","offset":0,"limit":48},"operationName":"userFollowersQuery"}
with requests.Session() as s:
s.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36'
while True:
resp = s.post(link,json=payload)
if not resp.json()['data']['follows']:break
for item in resp.json()['data']['follows']:
print(item['user']['username'])
payload['variables']['offset'] =48