I am trying to scrape question answers from workera.ai but I am stuck because selenium cannot find any element I searched for using class. When I check the page source the element is available but selenium can not find it. Here is what I am doing.
Signup using :https://workera.ai/candidates/signup
from selenium import webdriver
from selenium.webdriver.chrome import service
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
import time, os
option = webdriver.ChromeOptions()
option.add_argument("start-maximized")
option.add_experimental_option("excludeSwitches", ["enable-automation"])
option.add_experimental_option('useAutomationExtension', False)
option.add_argument("--disable-blink-features")
option.add_argument("--disable-gpu")
option.add_argument(r"--user-data-dir=C:\Users\user_name\AppData\Local\Google\Chrome\User Data") #e.g. C:\Users\You\AppData\Local\Google\Chrome\User Data
option.add_argument(r'--profile-directory=Profile 2') # using profile which is logged into the website
#option.add_argument("--headless")
option.add_argument('--disable-blink-features=AutomationControlled')
wd = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=option)
skill_sets = ['https://workera.ai/app/learner/skillset/82746bf6-4eb2-4065-b2fb-740bc3207d14','https://workera.ai/app/learner/skillset/7553e8f8-52bf-4136-a4ea-6aa63eb963d9','https://workera.ai/app/learner/skillset/e11cb698-38c1-4a4f-aa7b-43b85bdf5a51','https://workera.ai/app/learner/skillset/a999048c-ab99-4576-b849-4e72c9455418','https://workera.ai/app/learner/skillset/7df84ad9-ae67-4faf-a981-a95c1c02adbb', 'https://workera.ai/app/learner/skillset/737fa250-8c66-4ea0-810b-6847c304aa5b','https://workera.ai/app/learner/skillset/ed4f2f1f-2333-4b28-b36a-c7f736da9647','https://workera.ai/app/learner/skillset/323ba5d9-fffe-48c0-b7b4-966d1ebca99a','https://workera.ai/app/learner/skillset/488492e9-53c4-4600-b336-6dfe44340402']
# AI fluent AI literate DATA ANAlyst DATA Engineer DATA scientist Deep learn ML Responsible AI Software Engineer
for skill in skill_sets:
wd.get(skill)
time.sleep(20)
num = wd.find_element(By.CLASS_NAME, "sc-jNHgKk hrMhpT")# class name is different for every account
num = num.split('of')[1]
num = int(num)
print(num)
button = wd.find_elements(By.CLASS_NAME, "styled__SBase-sc-cmjz60-0 styled__SPrimary-sc-cmjz60-1 kSmXiJ hwoYMb sc-fKVqWL eOjNfz")
print(len(button))
wd.close()
I don't know why it is happening. Does the site block selenium web drivers or it is something else?. Thanks in advance.
Edit: I tried getting page source from selenium and then accessing elements using bs4 and it is working. So I think the website is blocking selenium by some mean. If any one knows the solution please help. I will accept and upvote the answer
CodePudding user response:
The problem with selenium is that you can't select elements that has more than one class like this.
In order to select them, you can either mention one class in the value, or use "." for example:
wd.find_element(By.CLASS_NAME,"class1.class2")
Also you can select the class that exists for all the answers which I believe it is this one "sc-jNHgKk", so you won't have the problem to select a class for each account, or you can just use XPATH instead.
num = int(wd.find_element(By.CLASS_NAME, "sc-jNHgKk").text.split("of ")[1])
button = wd.find_elements(By.CLASS_NAME, "styled__SBase-sc-cmjz60-0")
print(len(button))