I have a simple selenium python application that I'm attempting to web scrape the categories, which are links. The problem I'm having is getting the links on the left pane to come through as a list using xpath. Additionally, I'd like to capture the line class: ALIMENTARY TRACT AND METABOLISM / id: A / class type: ATC1-4 / show context, but I'm not sure where to start with that since it doesn't display in the html or chrome dev tools.
I'm pulling data from the following website:
https://mor.nlm.nih.gov/RxClass/search?query=ALIMENTARY TRACT AND METABOLISM|ATC1-4&searchBy=class&sourceIds=a&drugSources=atc1-4|atc,epc|dailymed,meshpa|mesh,disease|medrt,chem|dailymed,moa|dailymed,pe|dailymed,pk|medrt,tc|fmtsme,va|va,dispos|snomedct,struct|snomedct,schedule|rxnorm
My current code is the uncommented code that is working:
from selenium import webdriver
import pandas as pd
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.common.proxy import Proxy, ProxyType
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
#service = Service('C:\Program Files\Chrome Driver\chromedriver.exe')
URL = "https://mor.nlm.nih.gov/RxClass/search?query=ALIMENTARY TRACT AND METABOLISM|ATC1-4&searchBy=class&sourceIds=a&drugSources=atc1-4|atc,epc|dailymed,meshpa|mesh,disease|medrt,chem|dailymed,moa|dailymed,pe|dailymed,pk|medrt,tc|fmtsme,va|va,dispos|snomedct,struct|snomedct,schedule|rxnorm"
driver = webdriver.Chrome('C:\Program Files\Chrome Driver\chromedriver.exe')
driver.get(URL)
category = driver.find_elements_by_class_name(By.XPATH, "//div[@class='service drug_class']//a")
print(category)
#WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CSS_SELECTOR, 'tr.dbsearch')))
#pd.read_html(driver.page_source)[1].iloc[:,:-1].to_csv('table.csv',index=False)
#time.sleep(8)
#driver.quit()
Additionally, I've been trying to get the content on the page which is shown as:
class: ALIMENTARY TRACT AND METABOLISM / id: A / class type: ATC1-4 / show context
How can I access that text? Everything I tried gives the no such element or no such class name as the error. The main problem is I'm not sure how to find the name of these elements or classes in the javascript if it doesn't exist in the HTML or elements in chrome dev tools?
The error message that I'm getting on using the following is:
category = driver.find_elements_by_class_name(By.XPATH, "//div[@class='service drug_class']//a")
print(category)
TypeError: find_elements_by_class_name() takes 2 positional arguments but 3 were given
CodePudding user response:
To print the strong text names for the links on the left side of the page you have to induce WebDriverWait for the visibility_of_all_elements_located() and you can use either of the following Locator Strategies:
Using
CSS_SELECTOR
:driver.get("https://mor.nlm.nih.gov/RxClass/search?query=ALIMENTARY TRACT AND METABOLISM|ATC1-4&searchBy=class&sourceIds=a&drugSources=atc1-4|atc,epc|dailymed,meshpa|mesh,disease|medrt,chem|dailymed,moa|dailymed,pe|dailymed,pk|medrt,tc|fmtsme,va|va,dispos|snomedct,struct|snomedct,schedule|rxnorm") print([my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div.drug_class img a")))])
Console Output:
['Anatomical Therapeutic Chemical (ATC1-4)', 'ALIMENTARY TRACT AND METABOLISM (397)', 'ANABOLIC AGENTS FOR SYSTEMIC USE (9)', 'ANTIDIARRHEALS, INTESTINAL ANTIINFLAMMATORY/ANTIINFECTIVE AGENTS (44)', 'ANTIEMETICS AND ANTINAUSEANTS (13)', 'ANTIOBESITY PREPARATIONS, EXCL. DIET PRODUCTS (12)', 'BILE AND LIVER THERAPY (13)', 'DIGESTIVES, INCL. ENZYMES (7)', 'DRUGS FOR ACID RELATED DISORDERS (35)', 'DRUGS FOR CONSTIPATION (39)', 'DRUGS FOR FUNCTIONAL GASTROINTESTINAL DISORDERS (47)', 'DRUGS USED IN DIABETES (69)', 'MINERAL SUPPLEMENTS (30)', 'OTHER ALIMENTARY TRACT AND METABOLISM PRODUCTS (41)', 'STOMATOLOGICAL PREPARATIONS (31)', 'TONICS (0)', 'VITAMINS (23)', 'BLOOD AND BLOOD FORMING ORGANS (158)', 'CARDIOVASCULAR SYSTEM (326)', 'DERMATOLOGICALS (242)', 'GENITO URINARY SYSTEM AND SEX HORMONES (160)', 'SYSTEMIC HORMONAL PREPARATIONS, EXCL. SEX HORMONES AND INSULINS (66)', 'ANTIINFECTIVES FOR SYSTEMIC USE (334)', 'ANTINEOPLASTIC AND IMMUNOMODULATING AGENTS (324)', 'MUSCULO-SKELETAL SYSTEM (130)', 'NERVOUS SYSTEM (433)', 'ANTIPARASITIC PRODUCTS, INSECTICIDES AND REPELLENTS (77)', 'RESPIRATORY SYSTEM (213)', 'SENSORY ORGANS (174)', 'VARIOUS (137)', 'Established Pharmacologic Classes (EPC) [from DailyMed]', 'MeSH Pharmacologic Actions (MESHPA)', 'Diseases, Life Phases, Behavior Mechanisms and Physiologic States', 'Substances and Cells (CHEM) [from DailyMed]', 'Mechanism of Action (MoA) [from DailyMed]', 'Physiologic Effect (PE) [from DailyMed]', 'Pharmacokinetics (PK)', 'VA Classes (VA)', 'Therapeutic Categories (TC)', 'Disposition (DISPOS) [from SNOMEDCT]', 'Structure (STRUCT) [from SNOMEDCT]', 'CSA Schedule (SCHEDULE)']
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC
CodePudding user response:
It seems you are looking for the strong tag while all the links on the left are elements. Meaning you are not going to find them with strong.
Basically you are looking for this xpath to get any link:
//div[@class='service drug_class']//a[text()='Any link text here']
Replace the Any link text here with the exact link text.